what is hyper-parameters for the best checkpoint?

krasserm / fairseq-image-captioning

Transformer-based image captioning extension for pytorch/fairseq

Apache License 2.0

316 stars 56 forks source link

what is hyper-parameters for the best checkpoint? #10

Closed zplovekq closed 4 years ago

zplovekq commented 4 years ago

I sincerely want to know the hyper-parameters for the checkpoint20 and checkpoint24. Thanks!

krasserm commented 4 years ago

Checkpoint 20 was obtained from this training run and checkpoint 24 from this one. Both were obtained from parallel training on two GTX 1080 GPUs.

zplovekq commented 4 years ago

sincerely thanks for your reply! I have tried these hyper-parameters as you said.I train on 3 RTX 2080Ti and I get BELU 33.9 with the score.sh in your repository. So if these is some difference in different GPU? Thanks!

krasserm commented 4 years ago

BLEU-4 33.9 after CE loss training or after SCST training?

zplovekq commented 4 years ago

For the CE loss :D

zplovekq commented 4 years ago

I trained for about 100 epoch and the fairseq automaticly save the checkpoint best for the valid subset. I use the generate.py in your repository to generate test-prediction.json and use it for score.sh Thanks!

krasserm commented 4 years ago

You additionally need to fine-tune with SCST to get BLEU-4 scores up to 39.

zplovekq commented 4 years ago

Hi. I use the SCST to train with the train command in your README.md And i have tried a few times, all the training loops are str I use the SCST to train with the train command in your README.md And i have tried a few times, all the training loops are struck in the 41%, like this: then the loop did not go through..... I didn't know why.Is there I do something wrong? Thanks!

FantasyoO666 commented 2 years ago

Hi, my loss is alaways be -0 or +0, please tell me this is why？thank you！