influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.17k stars 3.51k forks source link

refactor: v1 recordbatch to json #25085

Open JeanArhancet opened 1 week ago

JeanArhancet commented 1 week ago

This PR addresses the second part of issue https://github.com/influxdata/influxdb/issues/24981. It replaces the usage of &arrow_json::writer::record_batches_to_json_rows(batches.as_slice())? in influxdb3_server/src/http/v1.rs by directly parsing RecordBatch into serde_json::Value.

Check-list:

JeanArhancet commented 6 days ago

@hiltontj Thanks for your feedback. I have a question concerning the Timestamp conversion. I'm not really confident in my current implementation and unsure if the format is correct. Does Influx v1 support RFC 3339 with the 'Z' suffix for UTC? In the tests, the datetime format used is "%Y-%m-%dT%H:%M:%S".

hiltontj commented 5 days ago

Does Influx v1 support RFC 3339 with the 'Z' suffix for UTC?

@JeanArhancet - that is a good question thanks for raising it. I looked at our documentation, which indicates that it uses RFC 3339 with Z (see response examples here). If the values coming out of the DB have timestamps, then we should be expressing them as UTC along with the Z (and in which case, the tests we have will need to be fixed).

If however, the timestamps are not holding a timestamp that may be a separate issue, and you could leave yours as is, since it is passing the tests.

Interestingly, I also tried the /query API on my local v1 OSS influxdb and it returns the time as a unix epoch (nanoseconds), not an RFC timestamp. Either way, we want to be consistent with the docs on this one.

JeanArhancet commented 4 days ago

Does Influx v1 support RFC 3339 with the 'Z' suffix for UTC?

@JeanArhancet - that is a good question thanks for raising it. I looked at our documentation, which indicates that it uses RFC 3339 with Z (see response examples here). If the values coming out of the DB have timestamps, then we should be expressing them as UTC along with the Z (and in which case, the tests we have will need to be fixed).

If however, the timestamps are not holding a timestamp that may be a separate issue, and you could leave yours as is, since it is passing the tests.

Interestingly, I also tried the /query API on my local v1 OSS influxdb and it returns the time as a unix epoch (nanoseconds), not an RFC timestamp. Either way, we want to be consistent with the docs on this one.

@hiltontj I've changed the datetime conversion to return an RFC 3339 compliant date. Locally, I also tried this metric, but I encountered an issue:

 server
        .write_lp_to_db(
            "foo",
          "errors,message='Hello world' count=-9223372036854775808 1422568543702900257\n\

            Precision::Nanosecond,
        )
        .await
        .unwrap();

And I've this errors:

called `Result::unwrap()` on an `Err` value: ApiError { code: 400, message: "{\"error\":\"partial write of line protocol occurred\",\"data\":[{\"original_line\":\"errors,message='hello world' count=-9223372036854775808 1422568543702900257\",\"line_number\":1,\"error_message\":\"No fields were provided\"}]}" } 

However, according to the documentation for v1, this seems to be a supported format: https://docs.influxdata.com/influxdb/v1/write_protocols/line_protocol_reference/#write-the-field-value-stringing-along-as-a-string-to-influxdb?

Additionally, the i64 values are currently being converted to scientific notation, e.g., -9223372036854775808 -> -9.223372036854776e18. I'm not sure if this respects the v1 specification.