marella / chatdocs

Chat with your documents offline using AI.
MIT License
684 stars 99 forks source link

Error when parquet files get too big / function for splitting? #33

Open Ananderz opened 1 year ago

Ananderz commented 1 year ago

Hi!

I have been uploading a lot of data and ran into a snappy compress error after reaching around 3,6GB of data in the parquet file.

Error: Invalid Error: Snappy decompression failure

I saw that there was a limit for parquetfiles and that limit is 4GB. Could we add functionality to split the parquet files when they reach 1 GB of data to get rid of this issue. Does anyone know how to do it ?