Training speed for transformer using SCST

ruotianluo / ImageCaptioning.pytorch

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

MIT License

1.43k stars 409 forks source link

Training speed for transformer using SCST #165

Open liuaohanjsj opened 1 year ago

liuaohanjsj commented 1 year ago

Hi, I'm wondering the training speed of transformer using new-self-critical or SCST. Because during training the model should be inferenced and the inference speed of transformers are much slower than training. In RNN this should not be a problem, but I think that using the transformer the training would be much slower (I implemented a version my self and the training using RL was about 20x slower). I'm curious about the training speed in your experiment. Do you have any suggestions?

ruotianluo commented 1 year ago

yes, it is much slower. I did add some optimization to speed it up a little, but didn't do any quantitative comparisons.