kamu-data / kamu-cli

New generation decentralized data lake and a streaming data pipeline
https://kamu.dev
Other
287 stars 10 forks source link

Support `timestampFormat` in ingest #438

Open sergiimk opened 6 months ago

sergiimk commented 6 months ago

Currently the timestampFormat is not supported in our datafusion-based ingest readers.

Datafusion and arrow timestamp support is currently lacking and only can parse rfc3339 strings.

Because of this, some datasets like covid/quebec.case-details have to rely on Spark to parse complex time formats.

This ticket is to investigate the support for advanced timestamp parsing.

sergiimk commented 3 months ago

This can be easily implemented now that datafusion supported chrono format specifiers e.g.:

select to_timestamp('2020/03/30 12:00:00+00', '%Y/%m/%d %H:%M:%S%#z') as t;
s373r commented 3 months ago

@sergiimk , is it a ticket to add a "custom" SQL function that will call our native code?

very interesting direction!