mahmoodlab / HIPT

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)
Other
497 stars 86 forks source link

Gradient accumulation not properly implemented #53

Open clemsgrs opened 1 year ago

clemsgrs commented 1 year ago

Hi, based on the following lines, it seems gradient accumulation is not properly implemented:

https://github.com/mahmoodlab/HIPT/blob/a9b5bb8d159684fc4c2c497d68950ab915caeb7e/2-Weakly-Supervised-Subtyping/utils/core_utils.py#L285-L290

A proper implementation should look like the following:

loss = loss / gc
loss.backward()

if (batch_idx + 1) % gc == 0:
      optimizer.step()
      optimizer.zero_grad()
vildesboe commented 8 months ago

Hi, based on the following lines, it seems gradient accumulation is not properly implemented:

https://github.com/mahmoodlab/HIPT/blob/a9b5bb8d159684fc4c2c497d68950ab915caeb7e/2-Weakly-Supervised-Subtyping/utils/core_utils.py#L285-L290

A proper implementation should look like the following:

loss = loss / gc
loss.backward()

if (batch_idx + 1) % gc == 0:
      optimizer.step()
      optimizer.zero_grad()

Hi! I'm also working on reproducing this HIPT paper. Would you be interested in some discussion?

clemsgrs commented 8 months ago

sure, happy to chat. I’ve made my own version of the code here: https://github.com/clemsgrs/hipt

you can contact me at: clement (dot) grisi (at) radboudumc (dot) nl