rom1504 / embedding-reader

Efficiently read embedding in streaming from any filesystem
MIT License
94 stars 19 forks source link

introduce parquet numpy reader #17

Closed rom1504 closed 2 years ago

rom1504 commented 2 years ago

also stop the readers on error

15

rom1504 commented 2 years ago

works but reading parquet from s3 is quite slow when the metadata is strings. Reading from local is much much faster I believe the reason is that the .slice of pyarrow on parquet is not doing anything very useful and is reading much more than it should maybe the only fast solution for parquet would be to have a local cache