openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
25.4k stars 3.26k forks source link

Improve the Speed in Batch Processing #352

Open ahmadmustafaanis opened 1 year ago

ahmadmustafaanis commented 1 year ago

How can I improve the Speed of my CLIP model in Batch Processing. Minimum Code:

tokenized_text = clip.tokenize(classes_descriptions).to(clip_device)

pre_processed_imgs =  [preprocess(Image.fromarray(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))) for im in batch]

logits_per_image, logits_per_text = model(
        torch.stack(pre_processed_imgs).to(device), tokenized_text
    )
probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy()

classes = [classes_descriptions[np.argmax(prob)] if prob.max() > 0.1 else None for prob in probs]

When the batch size is big, even for small images (100x100) it takes a lot of time i.e a 0.4, 0.5 seconds. I want to process it faster. What are some of the techniques/tricks I can use.

adtygan commented 1 year ago

Line 2: pre_processed_imgs = [preprocess(Image.fromarray(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))) for im in batch] Line 5: classes = [classes_descriptions[np.argmax(prob)] if prob.max() > 0.1 else None for prob in probs]

These 2 lines are using list comprehensions. Could you try using vectorization and see if performance improves?

ahmadmustafaanis commented 1 year ago

preprocess works using PIL and hence works on a single image not a batc.

mohdomama commented 9 months ago

Any solution for this yet?