grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.02k stars 509 forks source link

Misleading error after an unexpected content-type in querying endpoints #9166

Closed cyberox closed 1 week ago

cyberox commented 1 week ago

Describe the bug

I want to replace our Prometheus server instances with mimir, but the query on the /query_range endpoint gives an error on the start parameter: {"status":"error","errorType":"bad_data","error":"invalid parameter \"start\": cannot parse \"\" to a valid timestamp"}

To Reproduce

I use curl to call the prometheus endpoint, this request is working:

* Host localhost:9090 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:9090...
* Connected to localhost (::1) port 9090
> POST /api/v1/query_range HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/8.7.1
> Accept: */*
> X-ScopeID: organization-dev
> Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ
> Content-Length: 590
> Content-Type: multipart/form-data; boundary=------------------------ANYqkMHZfevxJjQSeEOHs2
>
* upload completely sent off: 590 bytes
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Thu, 29 Aug 2024 14:32:50 GMT
< Content-Length: 668
<
* Connection #0 to host localhost left intact
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"container":"nginx-proxy","namespace":"default","pod":"nginx-proxy-57fdf565f4-pxjcf"},"values":[[1724932996,"3178496"],[1724933056,"4509696"],[1724933116,"4952064"],[1724933176,"4952064"],[1724933236,"4947968"],[1724933296,"4947968"],[1724933356,"4947968"],[1724933416,"4947968"],[1724933476,"4947968"]]},{"metric":{"namespace":"default","pod":"nginx-proxy-57fdf565f4-pxjcf"},"values":[[1724932996,"4554752"],[1724933056,"4689920"],[1724933116,"5246976"],[1724933176,"5341184"],[1724933236,"5337088"],[1724933296,"5337088"],[1724933356,"5337088"],[1724933416,"5337088"],[1724933476,"5337088"]]}]}}%

When I do the same call against our mimir endpoint, it returns the error:

* Host mimir.dev.organization.local:443 was resolved.
* IPv6: (none)
* IPv4: 10.0.136.44, 10.0.145.234, 10.0.141.18
*   Trying 10.0.136.44:443...
* Connected to mimir.dev.organization.local (10.0.136.44) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES128-GCM-SHA256 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=*.dev.organization.local
*  start date: May 16 00:00:00 2024 GMT
*  expire date: Jun 15 23:59:59 2025 GMT
*  subjectAltName: host "mimir.dev.organization.local" matched cert's "*.dev.organization.local"
*  issuer: C=US; O=Test; CN=Test RSA 2048
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://mimir.dev.organization.local/prometheus/api/v1/query_range
* [HTTP/2] [1] [:method: POST]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: mimir.dev.organization.local]
* [HTTP/2] [1] [:path: /prometheus/api/v1/query_range]
* [HTTP/2] [1] [user-agent: curl/8.7.1]
* [HTTP/2] [1] [accept: */*]
* [HTTP/2] [1] [x-scope-orgid: organization-dev]
* [HTTP/2] [1] [authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ]
* [HTTP/2] [1] [content-length: 590]
* [HTTP/2] [1] [content-type: multipart/form-data; boundary=------------------------wmu250iStgcFp9CtvDVPPL]
> POST /prometheus/api/v1/query_range HTTP/2
> Host: mimir.dev.organization.local
> User-Agent: curl/8.7.1
> Accept: */*
> X-Scope-OrgID: organization-dev
> Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ
> Content-Length: 590
> Content-Type: multipart/form-data; boundary=------------------------wmu250iStgcFp9CtvDVPPL
>
* upload completely sent off: 590 bytes
< HTTP/2 400
< server: envoy
< date: Thu, 29 Aug 2024 14:33:34 GMT
< content-type: application/json
< content-length: 119
< vary: Accept-Encoding
< x-envoy-upstream-service-time: 5
<
* Connection #0 to host mimir.dev.organization.local left intact
{"status":"error","errorType":"bad_data","error":"invalid parameter \"start\": cannot parse \"\" to a valid timestamp"}%

Expected behavior

The same output as with the prometheus endpoint.

Environment

narqo commented 1 week ago

To help debugging it, could you also share the actual curl command (with the post payload) that you used. The error message notes the start is empty. But it's not clear if that's because its value was scrambled by something in-between, or it was indeed missing.

cyberox commented 1 week ago

Here is the curl command I used:

curl -v -X POST -s https://mimir.dev.organization.local/prometheus/api/v1/query_range \
--header 'X-Scope-OrgID: organization-dev' \
--header 'Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ' \
-F 'query=sum(container_memory_working_set_bytes{namespace="default"}) by (pod, container, namespace)' \
-F 'end="2024-08-29T12:11:16.000Z"' \
-F 'start="2024-08-29T11:11:16.000Z"' \
-F 'step=60'
narqo commented 1 week ago

Thank you for providing the command. I think the issue is that this curl command sends request with Content-Type: multipart/form-data, because of the -F flags. Prometheus doesn't officially support that, neither does Mimir. The API expects application/x-www-form-urlencoded (ref to Prometheus docs).

The request validation in Mimir should probably do a better job and report the bad request's content-type in this case.

Meanwhile, try using curl's -d or --data-urlencode instead. For example,

curl -v -s https://mimir.dev.organization.local/prometheus/api/v1/query_range \
  -d 'start=2024-08-29T11:11:16.000Z' ...
cyberox commented 1 week ago

Indeed, using data-urlencode the response is correct.