Closed rad-mit closed 3 months ago
Hi assuming you have all the images inside the list input_images
, try replacing this code inside demo.py
:
print("Generating captions ...\n")
for i in range(len(input_images)):
path = args.image_paths[i]
image = input_images[i]
beam_search_kwargs = {'beam_size': args.beam_size,
'beam_max_seq_len': args.max_seq_len,
'sample_or_max': 'max',
'how_many_outputs': 1,
'sos_idx': sos_idx,
'eos_idx': eos_idx}
with torch.no_grad():
pred, _ = model(enc_x=image,
enc_x_num_pads=[0],
mode='beam_search', **beam_search_kwargs)
pred = tokens2description(pred[0][0], coco_tokens['idx2word_list'], sos_idx, eos_idx)
print(path + ' \n\tDescription: ' + pred + '\n')
into this one
print("Generating captions ...\n")
sb_size = 32
num_samples = len(input_images)
num_sub_batch = math.ceil(num_samples / sb_size)
for sb_it in range(num_sub_batch):
from_idx = sb_it * sb_size
to_idx = min((sb_it + 1) * sb_size, num_samples)
paths = [args.image_paths[i] for i in range(from_idx, to_idx)]
images = torch.cat([input_images[i] for i in range(from_idx, to_idx)], dim=0)
beam_search_kwargs = {'beam_size': args.beam_size,
'beam_max_seq_len': args.max_seq_len,
'sample_or_max': 'max',
'how_many_outputs': 1,
'sos_idx': sos_idx,
'eos_idx': eos_idx}
with torch.no_grad():
pred, _ = model(enc_x=images,
enc_x_num_pads=[0] * len(images),
mode='beam_search', **beam_search_kwargs)
for i in range(to_idx-from_idx):
descr = tokens2description(pred[i][0], coco_tokens['idx2word_list'], sos_idx, eos_idx)
print(paths[i] + ' \n\tDescription: ' + descr + '\n')
Unfortunately, I cannot test the code currently, so I hope it works! Please let me know
Thanks for the quick help @jchenghu, the code works!
P.S. A minor fix: the last part where the outputs are printed throws an error as pred
gets reassigned to the first output, needs to be fixed there
Glad it worked! Also thank you for pointing out the error, fixed it :-)
Hi @jchenghu ,
I am looking to benchmark the time it takes for the model to complete providing results on batches of 32 images on the GPU. I have modified the demo.py code provided to take the path of the source directory of the dataset I am using and find all .jpg files in it and then run the code on device "cuda". However, the time I record currently is the processing (or inferencing) time for 32 images one-by-one as it runs in a loop and so, it may not be the most efficient time that the model can provide. I tried to use torch.stack(preprocessed_images), but it seems to have some issues and does not run. Is there a way I can do this?