matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.78k stars 275 forks source link

[Feature Request] Load parquet files #10162

Open fengttt opened 1 year ago

fengttt commented 1 year ago

Is there an existing issue for the same feature request?

Is your feature request related to a problem?

No response

Describe the feature you'd like

load parquet files. It is a popular format like csv or json. Parquet file can be nested. We want to load at least files with a flat schema.

For data with nested column data we should consider load as json. Not sure.

Describe implementation you've considered

No response

Documentation, Adoption, Use Case, Migration Strategy

No response

Additional information

No response

forsaken628 commented 6 months ago

15334 实现仅支持 parquet schema nullable=false 的字段

数据类型:

15585 支持nullable=true

数据类型:

15827 dictionary encoded string

对比 github.com/parquet-go/parquet-go 和 github.com/apache/arrow/go/v16/parquet arrow实现,维护更积极;提供Allocator,内存更可控;提供了ChunkReader,功能实现更容易;但是屏蔽了page细节,没有评估这一点会造成什么影响,因此放弃了替换。

heni02 commented 5 months ago

根据目前支持情况较少,和楠哥沟通,该feature目前1.2不上,挪到下一个版本