Hi, glad to see and use this cool project, thanks you.
I have a question: if it possible to batch predictions on Image captioning task?
I see https://github.com/salesforce/BLIP/issues/48 but it's not my case.
img = transform(sample).unsqueeze(0).to(device)
with torch.no_grad():
caption_bs_base=model_base.generate(img, sample=False, num_beams=7, max_length=16, min_length=5)
It works good, but i want to inference 4 models(vit base/large and beam search/nucleus sampling) and it's to long. On my server signature 12 pictures 4 models takes ~34 sec (12*4 = 48 signature).
Hi, glad to see and use this cool project, thanks you. I have a question: if it possible to batch predictions on Image captioning task? I see https://github.com/salesforce/BLIP/issues/48 but it's not my case.
i do something like:
base_model_path = 'path_to_base_model' model_base = blip_decoder(pretrained=base_model_path, vit='base', image_size=IMAGE_SIZE) model_base.eval() model_base.to(device)
img = transform(sample).unsqueeze(0).to(device) with torch.no_grad(): caption_bs_base=model_base.generate(img, sample=False, num_beams=7, max_length=16, min_length=5)
It works good, but i want to inference 4 models(vit base/large and beam search/nucleus sampling) and it's to long. On my server signature 12 pictures 4 models takes ~34 sec (12*4 = 48 signature).
Thanks you.