microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 252 forks source link

How to run captioning on any image? E.g. how to prepare the test.yaml and other required files required by the run_captioning.py? #183

Open aliencaocao opened 2 years ago

aliencaocao commented 2 years ago

The image contains classes seen in coco captioning dataset but I do not know how to extract the features for captioning using oscar+

jontooy commented 2 years ago

Hi aliencaocao,

It takes a couple of steps to prepare the features and yaml files

I attached a Colab notebook where I step by step generate the features with VinVL.

Amir-mjafari commented 2 years ago

Hi aliencaocao,

It takes a couple of steps to prepare the features and yaml files

I attached a Colab notebook where I step by step generate the features with VinVL.

Hi, Thank you so much for sharing. I tested and it really worked well. Can you please also provide us a demo on how to run run_captioning.py to get captions from box features extracted for each image with your demo?

Nidadadadada commented 2 years ago

Hi aliencaocao,

It takes a couple of steps to prepare the features and yaml files

I attached a Colab notebook where I step by step generate the features with VinVL.

hello,thank you so much for sharing.I tested it and found a bug when python tools/test_sg_net.py .I just executed each command in sequence,and I wonder if I did something wrong when executed your Colab notebook?Thank you for answering! Here is the bug information

2022-03-08 05:49:34,799 maskrcnn_benchmark.data.build WARNING: When using more than one image per GPU you may encounter an out-of-memory (OOM) error if your GPU does not have sufficient memory. If this happens, you can reduce SOLVER.IMS_PER_BATCH (for training) or TEST.IMS_PER_BATCH (for inference). For training, you must also adjust the learning rate and schedule length according to the linear scaling rule. See for example: https://github.com/facebookresearch/Detectron/blob/master/configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml#L14 Traceback (most recent call last): File "tools/test_sg_net.py", line 197, in main() File "tools/test_sg_net.py", line 193, in main run_test(cfg, model, args.distributed, model_name) File "tools/test_sg_net.py", line 55, in run_test data_loaders_val = make_data_loader(cfg, is_train=False, is_distributed=distributed) File "/content/drive/MyDrive/scene_graph_benchmark/maskrcnn_benchmark/data/build.py", line 170, in make_data_loader datasets = build_dataset(cfg, transforms, DatasetCatalog, is_train or is_for_period) File "/content/drive/MyDrive/scene_graph_benchmark/maskrcnn_benchmark/data/build.py", line 45, in build_dataset cfg, dataset_name, factory_name, is_train File "/content/drive/MyDrive/scene_graph_benchmark/maskrcnn_benchmark/data/datasets/utils/config_args.py", line 7, in config_tsv_dataset_args assert op.isfile(full_yaml_file) AssertionError

feifang24 commented 2 years ago

@Nidadadadada I encountered the same error at first and I think it's because we also need to modify the config yaml file. Right above the cell you executed the author writes:

Configure sgg_configs/vgattr/vinvl_x152c4.yaml and make sure os.path.join(DATA_DIR, DATASETS.TEST) is to your dataset yaml file. Current settings:

  DATASETS.TEST: ("train.yaml",)
  OUTPUT_DIR: "output/"
  DATA_DIR: "tools/mini_tsv/data/"
2021202420 commented 2 years ago

I also meet your error, and I try your method, It works, thank you.

eslambakr commented 1 year ago

Dears, @2021202420 @feifang24 @Amir-mjafari @jontooy

I run it and I successfully generated the needed features and files. But when I run Oscar model on the generated features using "run_captioning.py" I got wrong captions, it is just random staff and weird words, whihc indicates there is an issue in the feature format or something. Despite, I checked the generated labels and it seems make sense where almost all objects in the images is detected correctly.

So do u face this issue? I see u said that u managed to run the code and it works so can someone help me in this regard.

Thanks in advance!

jontooy commented 1 year ago

Dears, @2021202420 @feifang24 @Amir-mjafari @jontooy

I run it and I successfully generated the needed features and files. But when I run Oscar model on the generated features using "run_captioning.py" I got wrong captions, it is just random staff and weird words, whihc indicates there is an issue in the feature format or something. Despite, I checked the generated labels and it seems make sense where almost all objects in the images is detected correctly.

So do u face this issue? I see u said that u managed to run the code and it works so can someone help me in this regard.

Thanks in advance!

Hi eslambakr,

Although this was for me long ago, I do recall having a similar issue. I don't think your features are wrong (If you doubled checked them and they look right, they should be right).

Could you share the command you use to run the model? What settings do you use? I'd start with changing the BERT-model for a start.

eslambakr commented 1 year ago

Thanks for your prompt response! I figured out what was the issue. I was using the basic weights for OSCAR, when I used the Vinvl version; OSCAR+ it works fine. But I need to run the basic OSCAR therefore I guess I have to extract the features using the Bottom-Up approach instead.

hamzakhalil798 commented 1 year ago

@jontooy @eslambakr Hey! iv created the dataset using the three images present inside the above colab notebook... prepared dataset using type=Test and caption=False. model iv used for oscar+ is checkpoiint_base but I'm getting error on inference. Can you tell me how you ran inference using run_captioning?

here's my error.. image

hamzakhalil798 commented 1 year ago

never mind got it fixed.

hamza13-12 commented 1 year ago

@hamzakhalil798 I am also having trouble running image captioning. Can you please offer some guidance on how to set up oscar correctly and how to load the pre-trained checkpoints to accomplish this?