jchenghu / ExpansionNet_v2

Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"
https://arxiv.org/abs/2208.06551
MIT License
84 stars 24 forks source link

Running the model for 32 batch size of images with "cuda" as device #15

Closed rad-mit closed 3 months ago

rad-mit commented 3 months ago

Hi @jchenghu ,

I am looking to benchmark the time it takes for the model to complete providing results on batches of 32 images on the GPU. I have modified the demo.py code provided to take the path of the source directory of the dataset I am using and find all .jpg files in it and then run the code on device "cuda". However, the time I record currently is the processing (or inferencing) time for 32 images one-by-one as it runs in a loop and so, it may not be the most efficient time that the model can provide. I tried to use torch.stack(preprocessed_images), but it seems to have some issues and does not run. Is there a way I can do this?

jchenghu commented 3 months ago

Hi assuming you have all the images inside the list input_images, try replacing this code inside demo.py:

    print("Generating captions ...\n")
    for i in range(len(input_images)):
        path = args.image_paths[i]
        image = input_images[i]
        beam_search_kwargs = {'beam_size': args.beam_size,
                              'beam_max_seq_len': args.max_seq_len,
                              'sample_or_max': 'max',
                              'how_many_outputs': 1,
                              'sos_idx': sos_idx,
                              'eos_idx': eos_idx}
        with torch.no_grad():
            pred, _ = model(enc_x=image,
                            enc_x_num_pads=[0],
                            mode='beam_search', **beam_search_kwargs)
        pred = tokens2description(pred[0][0], coco_tokens['idx2word_list'], sos_idx, eos_idx)
        print(path + ' \n\tDescription: ' + pred + '\n')

into this one

    print("Generating captions ...\n")
    sb_size = 32
    num_samples = len(input_images)
    num_sub_batch = math.ceil(num_samples / sb_size)
    for sb_it in range(num_sub_batch):
        from_idx = sb_it * sb_size
        to_idx = min((sb_it + 1) * sb_size, num_samples)

        paths = [args.image_paths[i] for i in range(from_idx, to_idx)]
        images = torch.cat([input_images[i] for i in range(from_idx, to_idx)], dim=0)

        beam_search_kwargs = {'beam_size': args.beam_size,
                              'beam_max_seq_len': args.max_seq_len,
                              'sample_or_max': 'max',
                              'how_many_outputs': 1,
                              'sos_idx': sos_idx,
                              'eos_idx': eos_idx}
        with torch.no_grad():
            pred, _ = model(enc_x=images,
                            enc_x_num_pads=[0] * len(images),
                            mode='beam_search', **beam_search_kwargs)

        for i in range(to_idx-from_idx):                   
            descr = tokens2description(pred[i][0], coco_tokens['idx2word_list'], sos_idx, eos_idx)
            print(paths[i] + ' \n\tDescription: ' + descr + '\n')

Unfortunately, I cannot test the code currently, so I hope it works! Please let me know

rad-mit commented 3 months ago

Thanks for the quick help @jchenghu, the code works!

P.S. A minor fix: the last part where the outputs are printed throws an error as pred gets reassigned to the first output, needs to be fixed there

jchenghu commented 3 months ago

Glad it worked! Also thank you for pointing out the error, fixed it :-)