openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
24.55k stars 3.2k forks source link

Regarding the fact that the clip library can only be used to read data using the PIL library, we hope that the official can modify the source code #418

Open qsd-github opened 8 months ago

qsd-github commented 8 months ago

My code is as follows:

model, preprocess = clip.load("./weights/ViT-B-32.pt", device=device) image = preprocess(Image.open("./xxx.jpg")).unsqueeze(0).to(device) # only PIL library can be used here, using cv2 library will result in an error.

I took a look at the source code: def load(name: str,..): .... return return model, _transform(model.visual.input_resolution) # The problem is that _transform()

def _transform(n_px): return Compose([ ToPILImage(), # Adding ToPILImage() here and importing the corresponding library can solve the problem of not being able to read cv files Resize(n_px, interpolation=BICUBIC), CenterCrop(n_px), _convert_image_to_rgb, ToTensor(), Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)), ]) This is my usage source code and solution. If you use it differently from my source code and support the cv2 library, please share your opinion. I suggest the official modify this location!