ttanida / rgrg

Code for the CVPR paper "Interactive and Explainable Region-guided Radiology Report Generation"
MIT License
146 stars 25 forks source link

About training efficiency #2

Closed HeyQXX closed 1 year ago

HeyQXX commented 1 year ago

Hi, thanks for your excellent work. When I reproduced your work, I found that there is a serious bottleneck in data loading when reading MIMIC-CXR full-size images. I wonder if this is the case with you, which leads to a very high training time cost. Also, have you tried freezing the parameters of the object detector during the second and third stages of training?

ttanida commented 1 year ago

Hello there, thank you for reaching out and for your kind words about our work.

We haven’t noticed any bottlenecks in data loading, as our models were trained in (what we believe to be) reasonable timeframes. Could you make a minimal reproducible example of your dataloading bottleneck problem?

As for freezing the object detector weights, we haven’t done so, but it might be worth exploring.

HeyQXX commented 1 year ago

Thanks for the reply, I think it's because I didn't put the images on the SSD, I think I can alleviate this problem by resizing the image in advance.

Fivethousand5k commented 1 year ago

Hi, since the dataset is too huge, why didn't you try to resize them in advance and save them while only leaving those random transforms such as affine and GaussianNoise to perform in online mode? @ttanida @HeyQXX

ttanida commented 1 year ago

Hi @Fivethousand5k, it's true that resizing the images in advance can reduce the computational overhead during training or inference. However, I think the resulting speedup may not be significant since I would assume the forward and backward passes, especially when the language model is involved in the full model, take up most of the time.

If you do end up resizing the images in advance and notice a significant speedup during training or inference (against no resizing), I would appreciate if you could let me know your findings.