rom1504 / embedding-reader

Efficiently read embedding in streaming from any filesystem
MIT License
92 stars 19 forks source link

Add the possibility for the user to specify a regex string to target specific embeddings/metadata files with glob #31

Closed victor-paltz closed 1 year ago

victor-paltz commented 2 years ago

It was not possible to read parquet/NumPy files not ending with .parquet or .npy before Now we can select any file with a regex compatible with the glob function from fsspec

victor-paltz commented 2 years ago

@rom1504 what do you think about this review? We could of course just rename or copy the embeddings files of interest in a new folder, but with a more generic embedding_reader object we can avoid adding more complexity to the data pipelines