Open syucream opened 4 years ago
Arrow intermediate records should be memory efficient, will mitigate memory usage! https://github.com/reproio/columnify/issues/44
And also it can validate input data by given schema https://github.com/reproio/columnify/issues/27
Columnify uses Apache Arrow Schema/Record as an intermediate representation between various input formant and output ( currently only parquet ). It's powerful, fast memory accesses, supports columnar like representation. But Go implementation is not perfect yet e.g. Arrow record type doesn't support some types on its sub fields so it's not still applicable for Columnify. Additionally Arrow Go implementation doesn't support rich data conversion like PyArrow. Finally it's using "only Arrow Schema" as a necessary intermediate data now.
So we have some options to tackle this problems like:
As a tirivial topic,
gocredits
doesn't work on Go Arrow dependency. https://github.com/reproio/columnify/issues/4