ruotianluo / ImageCaptioning.pytorch

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
MIT License
1.43k stars 410 forks source link

Trained with Resnet152 feature #126

Open ydyrx-ldm opened 3 years ago

ydyrx-ldm commented 3 years ago

Hi, Thanks for the excellent code base. can i get the feature trained with resnet152?

ydyrx-ldm commented 3 years ago

Bottom-up features only trained with resnet101?

ydyrx-ldm commented 3 years ago

I'm waiting for your answer。thanks

ruotianluo commented 3 years ago

I was sleeping. You want bottom up with res152 or just res152?

ydyrx-ldm commented 3 years ago

Thank you very much for your prompt reply. What I want to know is: bottom up with res152, is there any program to generate the corresponding file, such as the compressed package of resnet101? such as https://imagecaption.blob.core.windows.net/imagecaption/trainval.zip thanks

ruotianluo commented 3 years ago

https://github.com/facebookresearch/vilbert-multi-task/blob/master/data/README.md

My impression is they have bottomup with res152. But I am not sure.

ydyrx-ldm commented 3 years ago

I have read the content of this link, it seems that it is generating about LMDB file, I am not sure whether it can work, do we need to generate the contents of Cocobu_FC, Cocobu_ATT and Cocobu_Box when we carry out bottom-up_resnet101 training? I don't know if I can get the content from the link.

ruotianluo commented 3 years ago

It's just the format. The third step would gives to numpy files, and you would need to convert into the format I uses.

ruotianluo commented 3 years ago

If you make sure they are using resnet152, i actually have provided the features here: https://github.com/ruotianluo/ImageCaptioning.pytorch/tree/master/data#image-features-option-3--vilbert-12-in-1-features.

It's an lmdb file supported by my code.

ydyrx-ldm commented 3 years ago

Once I have seen such a code at https://github.com/MILVLG/bottom-up-attention.pytorch I don't know if it's FASTER_RCNN with resnet_152, but I downloaded the code and tested it with some bug, so I gave it up temporarily

ydyrx-ldm commented 3 years ago

The link https://github.com/facebookresearch/vilbert-multi-task/tree/master/data and the downloaded file is named cocococo_trainval_resnext152_faster_rcnn_genome.lmdb /data.mdb The result is 95 G of content, and I don't know if I'm going to download it, but I'm doing it right now. I guess the file name says bottom up with resnet152. https://github.com/ruotianluo/ImageCaptioning.pytorch/tree/master/data#image-features-option-3--vilbert-12-in-1-features You're talking about something similar to bottom-up_resnet152? But with a file size of over 20 G, does it make a difference? Thank you very much for your patience and prompt reply.

ruotianluo commented 3 years ago

It should be the same but compressed. That is why it's small.

ydyrx-ldm commented 3 years ago

I have downloaded it, and it has only one file, called data.mdb, which occupies 95G of memory. How should I apply it to your code model?

ruotianluo commented 3 years ago

Why don't you download the file I provided?

ydyrx-ldm commented 3 years ago

Due to scientific Internet access, I cannot download your compressed package, and I cannot log in to this website https://drive.google.com

ruotianluo commented 3 years ago

You can try to write a customized loader, similar to the HybridLoader I have in the dataloader.py. I wouldn't suggest converting, because it will be more complicated I think. (I can't find my code for converting it.)

ydyrx-ldm commented 3 years ago

Oh, I don’t think I can do what you said. What you mean is: Even if I download the file you provided, I still cannot complete the corresponding training better, right? Because you can't find your code for converting it? Or is it that I downloaded his file https://github.com/facebookresearch/vilbert-multi-task/tree/master/data and your code cannot convert it?Thank you very much for your reply.

ruotianluo commented 3 years ago

If you download the file I provided, you can directly use it for training. If you download their file, you need to convert into my format or write a customized loader for the dataloader to load the feature.

ydyrx-ldm commented 3 years ago

Thanks for your reply. I have found a way to download the file you provided. I have a question: how to train after downloading? I don’t seem to find the corresponding content in the readme? (Or maybe I didn't pay attention to it). For example, I am using the New_FC model. How do I perform training corresponding to the content of this .lmdb file? Looking forward to your reply.

ruotianluo commented 3 years ago

em..... you'd better use attention models. for example. python train.py --id updown --cfg configs/updown/updown.yml --input_att_dir data/vilbert_att.lmdb

ydyrx-ldm commented 3 years ago

ok, thank you very much, I will try after downloading, I will come back to feedback after training.

ydyrx-ldm commented 3 years ago

Hello, I want to try Adaptive Attention, but I don't know how to use the command line, there seems to be some errors. Like this? python tools/train.py --id adaatt --caption_model adaatt --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_adaatt --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30

Or, if you use vilbert_att.lmdb, whether the command is: python train.py --id adaatt --input_json data/cocotalk.json --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_adaatt --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30 --input_att_dir data/vilbert_att.lmdb I seem to have failed in all my attempts. I'm sorry to trouble you again. Thanks, and have a great weekend.

ruotianluo commented 3 years ago

What is the error?

ydyrx-ldm commented 3 years ago

for first command line :the error is Traceback (most recent call last): File "tools/train.py", line 296, in train(opt) File "tools/train.py", line 57, in train assert getattr(saved_model_opt, checkme) == getattr(opt, checkme), "Command line argument and saved model disagree on '%s' " % checkme AssertionError: Command line argument and saved model disagree on 'rnn_size'

and Run the first command again, and the error is different: RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, kwargs) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/data/ImageCaptioning.pytorch/captioning/modules/loss_wrapper.py", line 47, in forward loss = self.crit(self.model(fc_feats, att_feats, labels[..., :-1], att_masks), labels[..., 1:], masks[..., 1:], reduction=reduction) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/data/ImageCaptioning.pytorch/captioning/models/CaptionModel.py", line 33, in forward return getattr(self, '_'+mode)(*args, kwargs) File "/data/ImageCaptioning.pytorch/captioning/models/AttModel.py", line 136, in _forward p_fc_feats, p_att_feats, pp_att_feats, p_att_masks = self._prepare_feature(fc_feats, att_feats, att_masks) File "/data/ImageCaptioning.pytorch/captioning/models/AttModel.py", line 118, in _prepare_feature fc_feats = self.fc_embed(fc_feats) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward input = module(input) FFile "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/functional.py", line 1370, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: size mismatch, m1: [3 x 0], m2: [2048 x 512] at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THC/generic/THCTensorMathBlas.cu:290

ruotianluo commented 3 years ago

Can you try other model like updown, because I haven't run adaptive_attention in like 3 years. I am more confident that updown would work.

ydyrx-ldm commented 3 years ago

for second command line :the error is: Traceback (most recent call last): File "tools/train.py", line 185, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag, struc_flag, drop_worst_flag) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/home/.conda/envs/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, *kwargs) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/data/ImageCaptioning.pytorch/captioning/modules/loss_wrapper.py", line 47, in forward loss = self.crit(self.model(fc_feats, att_feats, labels[..., :-1], attmasks), labels[..., 1:], masks[..., 1:], reduction=reduction) File "File "/home/.conda/envs/lib/python3.7/site-packages/orch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/data/ImageCaptioning.pytorch/captioning/models/CaptionModel.py", line 33, in forward return getattr(self, ''+mode)(*args, *kwargs) File "/data/ImageCaptioning.pytorch/captioning/models/ShowTellModel.py", line 81, in _forward output, state = self.core(xt.unsqueeze(0), state) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, **kwargs) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 556, in forward self.check_forward_args(input, hx, batch_sizes) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 508, in check_forward_args self.check_input(input, batch_sizes) File "/home/.conda/envs/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 155, in check_input expected_input_dim, input.dim())) RuntimeError: input must have 3 dimensions, got 4

ruotianluo commented 3 years ago

Try using 1 gpu.

ydyrx-ldm commented 3 years ago

CUDA_VISIBLE_DEVICES=3 python tools/train.py --id adaatt --input_json data/cocotalk.json --input_label_h5 data/co cotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_adaatt --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30 --input_att_dir data/vilbert_att.lmdb

and the same error

ydyrx-ldm commented 3 years ago

Do you see any mistakes in my two instructions?

ruotianluo commented 3 years ago

I don't see. Can you try updown. simply python train.py --id updown --cfg configs/updown/updown.yml

ydyrx-ldm commented 3 years ago

I think it will work, but I haven't downloaded the image features yet, but it has error when i use adaptive attention . Thank you very much. That's all for today. I will see how to solve it better tomorrow. Good night.

ruotianluo commented 3 years ago

You can use the same image features.