ruotianluo / ImageCaptioning.pytorch

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
MIT License
1.43k stars 409 forks source link

i want to do fine-tuning not COCO dataset #169

Open tnb1021 opened 1 year ago

tnb1021 commented 1 year ago

Hello. I would like to do fine-tuning with a trained model. However, when I try to few-shot learning the provided trained model with another dataset, I get an error because it refers to COCOID. Is it not possible to do fine-tuning with this implementation?

ruotianluo commented 1 year ago

The model can be trained on non coco dataset by converting your dataset into the same format.

tnb1021 commented 1 year ago

Thanks so much for your reply! What does "same format" mean? I tried:

python tools/train.py --id fc_nsc --caption_model newfc --input_json data_cub/gan_datalk.json --input_fc_dir data_cub/gantalk_fc --input_att_dir data_cub/gantalk_att --input_label_h5 data_cub/gan_datalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path fc_nsc --save_checkpoint_every 60 --val_images_use 20 --max_epochs 300 --start_from fc_nsc

gan_datalk.json and gantalk_fc(att) are generated from dataset_gan.json, that is same format dataset_coco.json.

I got error:

iter 543001 (epoch 47), train_loss = 1.319, time/batch = 0.066
Read data: 0.0001232624053955078
/home/user/ImageCaptioning.pytorch/captioning/data/dataloader.py:295: RuntimeWarning: Mean of empty slice.
  fc_feat = att_feat.mean(0)
iter 543002 (epoch 47), train_loss = 1.616, time/batch = 0.067
Read data: 0.000152587890625
/home/user/ImageCaptioning.pytorch/captioning/data/dataloader.py:295: RuntimeWarning: Mean of empty slice.
  fc_feat = att_feat.mean(0)
iter 543003 (epoch 47), train_loss = 1.462, time/batch = 0.066
Read data: 0.00020265579223632812
iter 543004 (epoch 47), train_loss = 1.462, time/batch = 0.066
Read data: 0.00015544891357421875
iter 543005 (epoch 47), train_loss = 1.451, time/batch = 0.065
Read data: 0.00020241737365722656
iter 543006 (epoch 47), train_loss = 1.298, time/batch = 0.051
Read data: 0.00019979476928710938
iter 543007 (epoch 47), train_loss = 1.619, time/batch = 0.063
Traceback (most recent call last):
  File "/home/user/ImageCaptioning.pytorch/tools/train.py", line 308, in <module>
    train(opt)
  File "/home/user/ImageCaptioning.pytorch/tools/train.py", line 178, in train
    data = loader.get_batch('train')
  File "/home/user/ImageCaptioning.pytorch/captioning/data/dataloader.py", line 334, in get_batch
    data = next(self.iters[split])
  File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
    return self._process_data(data)
  File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
    data.reraise()
  File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/user/ImageCaptioning.pytorch/captioning/data/dataloader.py", line 300, in __getitem__
    seq = self.get_captions(ix, self.seq_per_img)
  File "/home/user/ImageCaptioning.pytorch/captioning/data/dataloader.py", line 167, in get_captions
    ix1 = self.label_start_ix[ix] - 1 #label_start_ix starts from 1
IndexError: index 13886 is out of bounds for axis 0 with size 100

size 100" is the number of data in gan_dataset. And "index 13886" seems to be cocoid. Do I need any other options besides --start_from? I want to do a few-shot learning of image captioning, and I want to fine-tuning a model trained with COCO and another dataset.

ruotianluo commented 1 year ago

The ix is not the original cocoid, i believe it is reindexed during preprocessing. I'm not sure why the value is so large here. Ruotian LuoOn Dec 12, 2022, at 12:07 PM, tnb1021 @.***> wrote: Thanks so much for your reply! What does "same format" mean? I tried: python tools/train.py --id fc_nsc --caption_model newfc --input_json data_cub/gan_datalk.json --input_fc_dir data_cub/gantalk_fc --input_att_dir data_cub/gantalk_att --input_label_h5 data_cub/gan_datalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path fc_nsc --save_checkpoint_every 60 --val_images_use 20 --max_epochs 300 --start_from fc_nsc

gan_datalk.json and gantalk_fc(att) are generated from dataset_gan.json, that is same format dataset_coco.json. I got error: iter 543001 (epoch 47), train_loss = 1.319, time/batch = 0.066 Read data: 0.0001232624053955078 /home/user/ImageCaptioning.pytorch/captioning/data/dataloader.py:295: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) iter 543002 (epoch 47), train_loss = 1.616, time/batch = 0.067 Read data: 0.000152587890625 /home/user/ImageCaptioning.pytorch/captioning/data/dataloader.py:295: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) iter 543003 (epoch 47), train_loss = 1.462, time/batch = 0.066 Read data: 0.00020265579223632812 iter 543004 (epoch 47), train_loss = 1.462, time/batch = 0.066 Read data: 0.00015544891357421875 iter 543005 (epoch 47), train_loss = 1.451, time/batch = 0.065 Read data: 0.00020241737365722656 iter 543006 (epoch 47), train_loss = 1.298, time/batch = 0.051 Read data: 0.00019979476928710938 iter 543007 (epoch 47), train_loss = 1.619, time/batch = 0.063 Traceback (most recent call last): File "/home/user/ImageCaptioning.pytorch/tools/train.py", line 308, in train(opt) File "/home/user/ImageCaptioning.pytorch/tools/train.py", line 178, in train data = loader.get_batch('train') File "/home/user/ImageCaptioning.pytorch/captioning/data/dataloader.py", line 334, in get_batch data = next(self.iters[split]) File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 652, in next data = self._next_data() File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data return self._process_data(data) File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data data.reraise() File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/user/anaconda3/envs/py3GPU/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/user/ImageCaptioning.pytorch/captioning/data/dataloader.py", line 300, in getitem seq = self.get_captions(ix, self.seq_per_img) File "/home/user/ImageCaptioning.pytorch/captioning/data/dataloader.py", line 167, in get_captions ix1 = self.label_start_ix[ix] - 1 #label_start_ix starts from 1 IndexError: index 13886 is out of bounds for axis 0 with size 100

size 100" is the number of data in gan_dataset. And "index 13886" seems to be cocoid. Do I need any other options besides --start_from? I want to do a few-shot learning of image captioning, and I want to fine-tuning a model trained with COCO and another dataset.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>