ruotianluo / ImageCaptioning.pytorch

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
MIT License
1.43k stars 409 forks source link

how to change models #133

Open ydyrx-ldm opened 2 years ago

ydyrx-ldm commented 2 years ago

您好,抱歉我用中文问您问题。我想修改一下您的模型,比如我想修改updown模型,请问我应该在那个文件.py修改模型呢?我的想法是ImageCaptioning.pytorch-master\captioning\models\AttModel.py里面的这个部分? `class UpDownCore(nn.Module): def init(self, opt, use_maxout=False): super(UpDownCore, self).init() self.drop_prob_lm = opt.drop_prob_lm

    self.att_lstm = nn.LSTMCell(opt.input_encoding_size + opt.rnn_size * 2, opt.rnn_size) # we, fc, h^2_t-1
    self.lang_lstm = nn.LSTMCell(opt.rnn_size * 2, opt.rnn_size) # h^1_t, \hat v
    self.attention = Attention(opt)

def forward(self, xt, fc_feats, att_feats, p_att_feats, state, att_masks=None):
    prev_h = state[0][-1]
    att_lstm_input = torch.cat([prev_h, fc_feats, xt], 1)

    h_att, c_att = self.att_lstm(att_lstm_input, (state[0][0], state[1][0]))

    att = self.attention(h_att, att_feats, p_att_feats, att_masks)

    lang_lstm_input = torch.cat([att, h_att], 1)
    # lang_lstm_input = torch.cat([att, F.dropout(h_att, self.drop_prob_lm, self.training)], 1) ?????

    h_lang, c_lang = self.lang_lstm(lang_lstm_input, (state[0][1], state[1][1]))

    output = F.dropout(h_lang, self.drop_prob_lm, self.training)
    state = (torch.stack([h_att, h_lang]), torch.stack([c_att, c_lang]))

    return output, state`

是不是仅仅修改这个地方的代码就能修改updown的模型? 我还有个疑问,我试了一下随便修改上面的代码,好像这个也是能运行?这是什么情况? 希望您的回复,谢谢。

ruotianluo commented 2 years ago

只要core input output的形式不变,就能运行

ydyrx-ldm commented 2 years ago

我记得前天直接把self.lang_lstm = nn.LSTMCell(opt.rnn_size * 2, opt.rnn_size) # h^1_t, \hat v修改成self.lang_lstm = nn.LSTMCell(opt.rnn_size * 3, opt.rnn_size) # h^1_t, \hat v其他都没变还是能运行。这不应该是维度出现问题吗?

ydyrx-ldm commented 2 years ago

如果我确实修改了这个class UpDownCore(nn.Module):里的内容,我应该怎么保证我的代码被运行呢? 如果没有报错,是否代表全部执行?比如我修改updown模型,然后运行updown模型训练。 然后好像只要core input output的形式不变,改变里面的一些代码也不会出错。这使我疑惑,期待您的回复。

ruotianluo commented 2 years ago

我也有点疑惑。。。。我最近比较忙。你过两天再来提醒我一下。

ydyrx-ldm commented 2 years ago

好的。谢谢啦

ydyrx-ldm commented 2 years ago

Hello,我来提醒您关注下这个问题了。

ruotianluo commented 2 years ago

你给我一下你的training script?

ydyrx-ldm commented 2 years ago

好的,我的training script是 python tools/train.py --id updown --caption_model updown --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 128 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_updown --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30 值得注意的是:我使用的图像特征是resnet 101产生的而不是Faster rcnn产生的。

ydyrx-ldm commented 2 years ago

您好,请问有什么进展吗?是哪部分有问题呀?

ruotianluo commented 2 years ago

我改了那一行之后抱错了。。

ruotianluo commented 2 years ago
  File "/share/data/vision-greg/rluo/caption/tmp/captioning/models/AttModel.py", line 636, in forward
    h_lang, c_lang = self.lang_lstm(lang_lstm_input, (state[0][1], state[1][1]))
  File "/share/data/vision-greg/rluo/local/anaconda3/envs/virtex2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/share/data/vision-greg/rluo/local/anaconda3/envs/virtex2/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 1058, in forward
    self.bias_ih, self.bias_hh,
RuntimeError: input has inconsistent input_size: got 1024 expected 1536
ruotianluo commented 2 years ago

你要确认core跑了就在forward里面print一个东西

ydyrx-ldm commented 2 years ago

你要确认core跑了就在forward里面print一个东西

是的,我这样做了(forward里面print一个东西),然而什么都没有输出,会不会我的AttModel.py文件内容不一样?还是说:python tools/train.py --id updown --caption_model updown --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 128 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_updown --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30 出现问题?

ydyrx-ldm commented 2 years ago

还是说我的错误出现在用的是resnet 101产生的,而不是Faster rcnn产生的特征? 我的代码都是用您发布最新的。 and, 我的training script是也是按照说明书的要求。

ruotianluo commented 2 years ago

我是重新clone的master,然后跑的你的命令。

ruotianluo commented 2 years ago

你试试pip uninstall captioning?

ydyrx-ldm commented 2 years ago

ok,我试试。

ydyrx-ldm commented 2 years ago

我还是出现了问题: 我也是重新clone 了master 的ImageCaptioning.pytorch项目,clone cider,clone coco-caption 然后Prepare data,先 python scripts/prepro_labels.py --input_json data/dataset_coco.json --output_json data/cocotalk.json --output_h5 data/cocotalk 再准备图像特征 python scripts/prepro_feats.py --input_json data/dataset_coco.json --output_dir data/cocotalk --images_root $IMAGE_ROOT 又把self.lang_lstm= nn.LSTMCell(opt.rnn_size * 2, opt.rnn_size) 修改成self.lang_lstm = nn.LSTMCell(opt.rnn_size * 3, opt.rnn_size)其他都没变还是能运行。

我的training script是CUDA_VISIBLE_DEVICES=3 python tools/train.py --id updown --caption_model updown --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_updown --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30

ruotianluo commented 2 years ago

你试试看直接下载我处理好的data?虽然我不觉得是这个问题。你pip uninstall captioning了吗

ydyrx-ldm commented 2 years ago

pip uninstall captioning WARNING: Skipping captioning as it is not installed.

其实我是在其他地方创建文件夹,从头到尾新建了一个项目,重新clone 了master 的ImageCaptioning.pytorch项目,clone cider,clone coco-caption。所以我觉得captioning也是最新的了吧?

就是不清楚问题出在哪里。这是我运行之后的提示:

Hugginface transformers not installed; please visit https://github.com/huggingface/transformers
meshed-memory-transformer not installed; please run `pip install git+https://github.com/ruotianluo/meshed-memory-transformer.git`
DataLoader loading json file:  data/cocotalk.json
vocab size is  9487
DataLoader loading h5 file:  data/cocotalk_fc data/cocotalk_att data/cocotalk_box data/cocotalk_label.h5
max sequence length in data is 16
read 123287 image features
assigned 113287 images to split train
assigned 5000 images to split val
assigned 5000 images to split test
2021-09-13 21:03:24.513746: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared 
object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:2021-09-13 21:03:24.513818: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Read data: 0.0002644062042236328
iter 13403 (epoch 1), train_loss = 2.422, time/batch = 0.580
Read data: 0.00020313262939453125
iter 13404 (epoch 1), train_loss = 2.629, time/batch = 0.093
Read data: 0.0001647472381591797
iter 13405 (epoch 1), train_loss = 2.330, time/batch = 0.089
Read data: 0.00012564659118652344
ruotianluo commented 2 years ago

我之所以说uninstall,因为你说print没有用,我觉得有可能是用的其他地方的代码。要不你改了,print就应该显示呀。

ydyrx-ldm commented 2 years ago

新建了一个文件夹,在新的文件夹下运行,应该不太可能运行到其他地方的代码了吧?

ruotianluo commented 2 years ago

现在print还是没用吗

ydyrx-ldm commented 2 years ago

没有,我改的是: class UpDownCore(nn.Module): def init(self, opt, use_maxout=False): super(UpDownCore, self).init() self.drop_prob_lm = opt.drop_prob_lm

    self.att_lstm = nn.LSTMCell(opt.input_encoding_size + opt.rnn_size * 2, opt.rnn_size) # we, fc, h^2_t-1
    self.lang_lstm = nn.LSTMCell(opt.rnn_size * 3, opt.rnn_size) # h^1_t, \hat v
    self.attention = Attention(opt)

def forward(self, xt, fc_feats, att_feats, p_att_feats, state, att_masks=None):
    prev_h = state[0][-1]
    att_lstm_input = torch.cat([prev_h, fc_feats, xt], 1)

    h_att, c_att = self.att_lstm(att_lstm_input, (state[0][0], state[1][0]))

    print("-------------------------------------------------")

    att = self.attention(h_att, att_feats, p_att_feats, att_masks)

    lang_lstm_input = torch.cat([att, h_att], 1)
    # lang_lstm_input = torch.cat([att, F.dropout(h_att, self.drop_prob_lm, self.training)], 1) ?????

    h_lang, c_lang = self.lang_lstm(lang_lstm_input, (state[0][1], state[1][1]))

    output = F.dropout(h_lang, self.drop_prob_lm, self.training)
    state = (torch.stack([h_att, h_lang]), torch.stack([c_att, c_lang]))

    return output, state
ydyrx-ldm commented 2 years ago

我需不需要换台电脑尝试一下?或者换个服务器?我用的是linux系统服务器,好像不能debug

ruotianluo commented 2 years ago

为啥linux不能debug???

ruotianluo commented 2 years ago

你可以装一个pudb,可以命令行debug

ruotianluo commented 2 years ago

pip install就可以了

ruotianluo commented 2 years ago

你还是先换个电脑试试

ydyrx-ldm commented 2 years ago

就是我的linux系统没有界面的,没有像桌面那种软件。 代码命令进行debug我不太会 好的,我明天换电脑试试。

ydyrx-ldm commented 2 years ago

我刚尝试了一下把/captioning/model文件夹整个删除,还是能运行。 应该是运行到别处的代码了?

ydyrx-ldm commented 2 years ago

有没有可能是我没有用多线程GPU?我只是用了一块GPU。CUDA_VISIBLE_DEVICES=3 明天早上我再去尝试实验,谢谢您,晚安。

ydyrx-ldm commented 2 years ago

你好,我好像解决了,原因果然是调用了其他文件位置(同一路径下的其他文件夹)的captioning。当我把所有的master项目删除。重新安装就能输出print了。也能修改model了。谢谢您。

另外我有一个问题:一般您的设置参数--learning_rate、--max_epochs设置为多少呀?是 5e-4 和30epoch?