tremor-rs / tremor-runtime

Main Tremor Project Rust Codebase
https://www.tremor.rs
Apache License 2.0
851 stars 126 forks source link

Add parquet format to codecs #1294

Open mattbailey opened 2 years ago

mattbailey commented 2 years ago

Describe the problem you are trying to solve

Tremor cannot encode/decode parquet as a codec.

Describe the solution you'd like

Would be nice to have parquet as a supported codec format.

Notes

Official rust implementation of parquet can be found at the apache arrow project: https://github.com/apache/arrow-rs

mfelsche commented 2 years ago

Oh yeah, lets make that happen!

tobim commented 2 years ago

You might want to look at https://github.com/jorgecarleitao/parquet2 as well. It is a more idiomatic rewrite of parquet.

mavam commented 2 years ago

If you also want to process data via IPC (e.g., network, UNIX pipes, shared mmap), then Arrow IPC would offer higher interop.

Arrow itself has the ability to read/write Parquet, which is typically only used as on-disk file format.

Licenser commented 2 years ago

That's definetly worht looking at too! We generally try to separate the encoding (arrow/parquet) from the transport (UNIX, network, mmap, etc) that way the parts become interchangeable (i.e. we have a UNIX socket, a TCP, and a upd connector, so adding Arrow encoding we'd unlock all those transports at once :D )

Licenser commented 2 years ago

https://docs.rs/arrow/latest/arrow/index.html adding this for keeping

mrchypark commented 1 month ago

any update this issue?

Licenser commented 1 month ago

HI @mrchypark ,

we haven't done any work on this yet sorry, between time constraints and little demand in our direct interactions it never bubbled up as a priority