rom1504 / embedding-reader

Efficiently read embedding in streaming from any filesystem
MIT License
92 stars 19 forks source link

EmbeddingReader.__init__() got an unexpected keyword argument 'metadata_folder' #43

Closed segalinc closed 7 months ago

segalinc commented 10 months ago

Hello,

I am trying to use the library to read some embeddings but I get this error

here how to reproduce

from embedding_reader import EmbeddingReader

embedding_reader = EmbeddingReader(embeddings_folder="https://mystic.the-eye.eu/public/AI/cah/laion5b/embeddings/laion2B-en/img_emb/",
                                   metadata_folder="https://mystic.the-eye.eu/public/AI/cah/laion5b/embeddings/laion2B-en/laion2B-en-metadata/", 
                                   file_format="parquet_npy",    
                                   meta_columns=["url", "caption"],
)

print("embedding count", embedding_reader.count)
print("dimension", embedding_reader.dimension)
print("total size", embedding_reader.total_size)
print("byte per item", embedding_reader.byte_per_item)

for emb, meta in embedding_reader(batch_size=10 ** 6, start=0, end=embedding_reader.count, max_ram_usage_in_bytes=2**30):
    print(emb.shape)
    print(meta["url"], meta["caption"])
    break

Error : TypeError: EmbeddingReader.__init__() got an unexpected keyword argument 'metadata_folder'

rom1504 commented 10 months ago

that's unexpected... what version of the lib are you using?

segalinc commented 10 months ago

It's 1.1.7 I will try to update it to the latest which seems 1.5.1?

segalinc commented 10 months ago

also what if I want to read both text and img emebdding? is is possible to achieve this in one reader?

segalinc commented 10 months ago

ok I guess it worked even tho I am gettin another error that it doesn't find the embedding folder now

rom1504 commented 10 months ago

This is quite surprising

Can you say more about your environment? Python version, operating system, venv,...

rom1504 commented 10 months ago

About reading image and text embedding, you can make 2 instance of the reader and then use zip to combine the iterators

segalinc commented 10 months ago

Looks like the image embedding are not in shape 4x32x32 but more like CLIP encoded?1

On Wed, Oct 4, 2023, 8:59 PM Romain Beaumont @.***> wrote:

About reading image and text embedding, you can make 2 instance of the reader and then use zip to combine the iterators

— Reply to this email directly, view it on GitHub https://github.com/rom1504/embedding-reader/issues/43#issuecomment-1747997349, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHLPEQIGF7GCZODAT577IDX5YWDXAVCNFSM6AAAAAA5S25XM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXHE4TOMZUHE . You are receiving this because you authored the thread.Message ID: @.***>

rom1504 commented 10 months ago

Laion2B provided image embeddings are clip image embeddings

On Thu, Oct 5, 2023, 17:21 Cristina Segalin @.***> wrote:

Looks like the image embedding are not in shape 4x32x32 but more like CLIP encoded?1

On Wed, Oct 4, 2023, 8:59 PM Romain Beaumont @.***> wrote:

About reading image and text embedding, you can make 2 instance of the reader and then use zip to combine the iterators

— Reply to this email directly, view it on GitHub < https://github.com/rom1504/embedding-reader/issues/43#issuecomment-1747997349>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACHLPEQIGF7GCZODAT577IDX5YWDXAVCNFSM6AAAAAA5S25XM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXHE4TOMZUHE>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/rom1504/embedding-reader/issues/43#issuecomment-1748464003, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437U2LZ7EYKECSRDHSLLX5Z3YTAVCNFSM6AAAAAA5S25XM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBYGQ3DIMBQGM . You are receiving this because you commented.Message ID: @.***>