Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
Users can use the fragment API to create fragments and then commit a dataset using the python commit method (this is an advanced use case).
However, it is not possible to set the ManifestWriteConfig from python. This means:
When creating an empty dataset it is always created in v1 format.
There is no way to choose stable row ids when creating a dataset using the commit approach.
Should we expose this config to the python API?
Should we make the storage version and/or stable row id flags part of the "operation"?
Is there some other approach we can take?
Users can use the fragment API to create fragments and then commit a dataset using the python
commit
method (this is an advanced use case).However, it is not possible to set the
ManifestWriteConfig
from python. This means:Should we expose this config to the python API? Should we make the storage version and/or stable row id flags part of the "operation"? Is there some other approach we can take?