worse cider result when self-critical training using raw config

tyrando commented 3 years ago

Hi, i use the raw config file transformer.yml to train and get info.pkl and model.pth. for sc training, i add start_from in transformer.yml and use transformer_sc.yml, however the result is so worse(the Cider score is near zero).

i don't know where the problem is. Maybe, for sc training, we needn't change the yml(add start_from)?

ruotianluo commented 3 years ago

You have to train separately under currently implementation. Meaning that you need to first run with transformer.yml then transformer_sc.yml, like in the readme.

tyrando commented 3 years ago

You have to train separately under currently implementation. Meaning that you need to first run with transformer.yml then transformer_sc.yml, like in the readme.

it means i use the transformer_sc.yml to training from scratch and not use the model and infos files trained from transformer.yml to continue training sc? because in readme, i find the sc configs begins with epoch 0, not use the xe model and infos. Or should i add start_from and checkpoints_path when using transfomer_sc.yml? Thanks

ruotianluo commented 3 years ago

Did you run bash scripts/copy_model.sh fc fc_rl? If you this, the selfcritical should not begin with 0.

tyrando commented 3 years ago

Did you run bash scripts/copy_model.sh fc fc_rl? If you this, the selfcritical should not begin with 0.

Thanks a lot. It works.

two questions: 1.i checked the copy_model.sh. and renamed the log_trans/ to log_trans_sc/ ,infos_trans-best.pkl to infos_trans_sc-best.pkl, infos_trans.pkl to infos_trans_sc.pkl. i think it works the same as copy_model.sh. However the result is nope. why? 2.to continue sc-training, actually, i not use start_fromand checkpoints_path. However, where can train.py load thelog_xx/ files?

however, evaluation at epoch 15: (so many UNK and CIDEr is 0.014)

image 304220: UNK man through UNK man image 518188: toilet UNK man through UNK man image 428218: UNK man through UNK man image 13867: UNK man through UNK man image 275025: toilet UNK man through UNK man evaluating validation preformance... 4980/5000 (16.954071) image 362971: UNK man through UNK man image 474: UNK man through UNK man image 49327: UNK man through doing UNK man image 144959: pizza UNK man through UNK man image 349414: pizza UNK man through UNK man image 143359: UNK man doing UNK man through UNK man image 546658: UNK man doing UNK man through image 200563: UNK man through doing surfing UNK man image 345469: UNK man through UNK man image 306619: court UNK man through UNK man evaluating validation preformance... 4990/5000 (16.563679) image 324313: UNK man through UNK man image 46616: cupping UNK man through UNK man image 285832: UNK man through UNK man image 496718: UNK man through UNK man image 398209: UNK man through UNK man image 568041: toilet UNK man through UNK man image 206596: court UNK man through UNK man image 451949: UNK court UNK man through UNK image 203138: UNK man through UNK man image 296759: UNK man through UNK man evaluating validation preformance... 5000/5000 (17.038965) loading annotations into memory... Done (t=6.03s) creating index... index created! using 5000/5000 predictions Loading and preparing results... DONE (t=24.96s) creating index... index created! tokenization... PTBTokenizer tokenized 307821 tokens at 851139.82 tokens per second. PTBTokenizer tokenized 33020 tokens at 219563.78 tokens per second. setting up scorers... computing Bleu score... {'testlen': 28021, 'reflen': 42599, 'guess': [28021, 23021, 18021, 13022], 'correct': [1893, 6, 0, 0]} ratio: 0.6577853940233185 Bleu_1: 0.040 Bleu_2: 0.002 Bleu_3: 0.000 Bleu_4: 0.000 computing METEOR score... METEOR: 0.026 computing Rouge score... ROUGE_L: 0.043 computing CIDEr score... CIDEr: 0.014

ruotianluo commented 3 years ago

1, I don't know what is the problem 2 at L51 and l79. Without start from, the code will look at log_$id

tyrando commented 3 years ago

1, I don't know what is the problem 2 at L51 and l79. Without start from, the code will look at log_$id

Hi, i do sc-training transformer_sc.yml as the readme. However, the result is most prediction words is UNK(like above reply), thus, cider is nearly 0. Why the CIDEr decreases so much, even can't compared with not using the sc-training?

ruotianluo commented 3 years ago

When you start sc training, what is the first printed iteration number.

tyrando commented 3 years ago

When you start sc training, what is the first printed iteration number.

Actually, for test, i create the log_trans_sc/, and copy the model and info provided in MODEL_ZOO.md(just transfomer xe version,not using sc)to the directory to sc training. The first priinted iteration number is 169971. During training,the Cider score is nearly zero.

ruotianluo commented 3 years ago

During training, cider socre should not be near zero. Did you do all the preprocessing for sc training?

tyrando commented 3 years ago

During training, cider socre should not be near zero. Did you do all the preprocessing for sc training?

hi, i do all all the preprocessing for sc training.(i use the modelzoo's pretrained transformer xe model for test) the start of sc training log is like this, can you give me some advise?

Warning: key N_enc not in args Warning: key N_dec not in args Warning: key d_model not in args Warning: key d_ff not in args Warning: key num_att_heads not in args Warning: key dropout not in args DataLoader loading json file: data/cocotalk.json vocab size is 9487 DataLoader loading h5 file: data/cocotalk_fc data/cocobu_att data/cocotalk_box data/cocotalk_label.h5 max sequence length in data is 16 read 123287 image features assigned 113287 images to split train assigned 5000 images to split val assigned 5000 images to split test loading ./log_trans_sc/infos_trans_sc.pkl... loading ./log_trans_sc/model.pth... Read data: 0.00046825408935546875 Cider scores: 0.0006203028319278637 iter 169971 (epoch 15), avg_reward = 0.001, time/batch = 1.722 Read data: 0.00021409988403320312 Cider scores: 0.0008155954094398387 iter 169972 (epoch 15), avg_reward = 0.001, time/batch = 1.269 Read data: 0.00016570091247558594 Cider scores: 0.0022134314567592694 iter 169973 (epoch 15), avg_reward = 0.003, time/batch = 1.219 Read data: 0.00011110305786132812 Cider scores: 0.0 iter 169974 (epoch 15), avg_reward = 0.000, time/batch = 1.142 Read data: 0.0001404285430908203 Cider scores: 4.406316070655577e-05 iter 169975 (epoch 15), avg_reward = 0.000, time/batch = 1.140 Read data: 0.00015091896057128906 Cider scores: 0.002784016295889412 iter 169976 (epoch 15), avg_reward = 0.001, time/batch = 1.201 Read data: 0.00015997886657714844 Cider scores: 4.828932544661676e-05 iter 169977 (epoch 15), avg_reward = 0.000, time/batch = 1.223 Read data: 0.00014209747314453125 Cider scores: 0.005744999524699371 iter 169978 (epoch 15), avg_reward = 0.007, time/batch = 1.193 Read data: 0.00041866302490234375 Cider scores: 0.0006674017291188975 iter 169979 (epoch 15), avg_reward = -0.004, time/batch = 1.190 Read data: 0.00016450881958007812 Cider scores: 0.004829984634580711 iter 169980 (epoch 15), avg_reward = -0.005, time/batch = 1.225 Read data: 0.000263214111328125 Cider scores: 0.00012223621905613476 iter 169981 (epoch 15), avg_reward = 0.000, time/batch = 1.153 Read data: 0.00014901161193847656 Cider scores: 0.0 iter 169982 (epoch 15), avg_reward = 0.000, time/batch = 1.266 Read data: 0.0001850128173828125 Cider scores: 0.0 iter 169983 (epoch 15), avg_reward = 0.000, time/batch = 1.332 Read data: 0.00017976760864257812 Cider scores: 0.0 iter 169984 (epoch 15), avg_reward = 0.000, time/batch = 1.151 Read data: 0.0001900196075439453 Cider scores: 0.0010840173980693 iter 169985 (epoch 15), avg_reward = 0.001, time/batch = 1.193 Read data: 0.0005791187286376953 Cider scores: 0.0004594107350352113 iter 169986 (epoch 15), avg_reward = 0.001, time/batch = 1.191 Read data: 0.0002295970916748047 Cider scores: 0.000567904685997325 iter 169987 (epoch 15), avg_reward = 0.001, time/batch = 1.209 Read data: 0.00018072128295898438 Cider scores: 0.0

ruotianluo commented 3 years ago

While using the pertrained models, did you also use the provided cocotalk.json? The vocab dictionary of the model may be different from your cocotalk.json currently using

tyrando commented 3 years ago

While using the pertrained models, did you also use the provided cocotalk.json? The vocab dictionary of the model may be different from your cocotalk.json currently using

i just use the provided model.pth and info.pkl. Because i find the vocab dictionary already existed in info.pkl. And today i use the provided cocotalk.json to replace ，however, the result is the same worse.

wlufy commented 2 years ago

While using the pertrained models, did you also use the provided cocotalk.json? The vocab dictionary of the model may be different from your cocotalk.json currently using

i just use the provided model.pth and info.pkl. Because i find the vocab dictionary already existed in info.pkl. And today i use the provided cocotalk.json to replace ，however, the result is the same worse.

Sorry to disturb you, have you solved this problem?

I have a similar problem with you. When i use scst to finetune the pretrained model, I find that all metrics are decreasing, and finally are nearly 0. Obviously, that is wrong.

What can i do to solve this problem? Can you give me some advise?

ruotianluo / self-critical.pytorch

worse cider result when self-critical training using raw config #238