Hi,
Thanks for your work.
I made a code to make inference for one image and one text, if it helps someone:
Getting the model:
def get_model(model_path):
model = CLIPModel().to(CFG.device)
model.load_state_dict(torch.load(model_path, map_location=CFG.device))
model.eval()
return model
Hi, Thanks for your work. I made a code to make inference for one image and one text, if it helps someone:
Getting the model:
Getting an image embedding:
Getting text embedding:
Make an inference:
Hope this help