quickwit-oss / quickwit-datasource

Quickwit data source for Grafana
GNU Affero General Public License v3.0
39 stars 9 forks source link

Multisearch response unmarshalling loses nano precision from numeric timestamps #118

Open ddelemeny opened 3 months ago

ddelemeny commented 3 months ago

This issue proves complex to tackle, better keep a description of things here .

The problem :

https://github.com/quickwit-oss/quickwit-datasource/blob/8a3c88b0301b4e68f5fc8c8b8c38730dd0d99173/pkg/quickwit/response_parser.go#L296

When creating a data frame from the response to a search query, the response parser finds numeric timestamp values cast as float64. This cast is not wanted, as float64 can't represent nanosecond precision timestamps.

Where does it happen

https://github.com/quickwit-oss/quickwit-datasource/blob/8a3c88b0301b4e68f5fc8c8b8c38730dd0d99173/pkg/quickwit/client/client.go#L188

The json response from the API gets unmarshalled early in the processing pipeline. The json decoder used in ExecuteMultisearch casts number to float64 by default

The solution :

The decoder can be told to unmarshal numbers to a "polymorphic" type : json.Number, which allows further processing to decide the actual type of the specific datum down the line.

The other problem :

Changing from a simple type to a polymorphic one in a complex multi-branched processing pipeline tends to break a lot of stuff. Reworking the whole call-tree of a Multisearch to handle json.Number instead of float64 numbers is an absurd chore.

The more complete solution ?

A reasonable approach would be to perform a shallow unmarshal early in the process, only to be able to dispatch responses to sub-handlers. Then do a second unmarshal of the response in subhandlers, when it's appropriate to decide if we want the polymorphic type (when timestamps are used). response_parser.go:parseResponse looks like a good candidate

ddelemeny commented 2 months ago

sort key seems to suffer from precision loss too, resulting in a possible shuffled ascending order : the dataframe's nanos field received from the backend has incorrect values.

Need more love.

Edit : dataframe nanos don't come from the sort key but from the timestamp. Available timestamp output formats currently do not represent nanos accurately in Quickwit, this needs to be addressed first.