xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.85k stars 134 forks source link

Generating batch embeddings using hugging face datasets crashes my EC2 instance on XL model. #39

Closed bogedy closed 9 months ago

bogedy commented 1 year ago

This only happens with the XL model, large and smaller seem to work fine.

Here's how I import it and verify that it's working:

from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-xl')
sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments"
instruction = "Represent the Science title:"
embeddings = model.encode([[instruction,sentence]])
print(embeddings[0, :5])

Then I batch process using Hugging Face datasets:

from datasets import Dataset
ds = Dataset.from_parquet('../data/mydata.parquet')

import torch
torch.backends.cuda.matmul.allow_tf32 = True

BATCH_SIZE = 4

instruction = "Represent the news article for clustering"
def encode_text(item):
    input = [[instruction, subitem] for subitem in item['text']]
    return {"embedding": model.encode(input, batch_size=BATCH_SIZE)}

ds = ds.map(encode_text, batched=True, batch_size=BATCH_SIZE, remove_columns='text')

This will either crash my ipykernel or worse take my entire EC2 instance offline. Seems like this shouldn't be happening, the model needs 5GB of VRAM and my g5.xlarge instance has 24GB.

Am I doing the batching correctly? This is the only way I could make it work/make it make sense.

Thanks!

hongjin-su commented 1 year ago

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

As the instance has only 24GB of memory, it is likely that the GPU is overloaded. To save memory, you may try to reduce the batch size, or use a shorter sequence length with the following code:

model.max_seq_length = 256

Feel free to add any further questions or comments!

bogedy commented 1 year ago

I find this odd though, the model's pytorch_model.bin is only ~5GB. My instance crashes even with BATCH_SIZE=1 model.max_seq_length = 64. Shouldn't I have plenty of memory?

And isn't this a curious issue that it's able to take my whole instance offline? I'm used to CUDA out of memory issues but not this.

It's so weird that it's probably my own issue and has nothing to do with this repo, but I thought I should check.

hongjin-su commented 1 year ago

Hi, is the problem solved? Can you generate the embeddings without using batching?

hongjin-su commented 9 months ago

Feel free to re-open this issue if you have any further questions or comments!