Gradient accumulation not properly implemented

mahmoodlab / HIPT

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)

Other

514 stars 90 forks source link

Gradient accumulation not properly implemented #53

Open clemsgrs opened 1 year ago

clemsgrs commented 1 year ago

Hi, based on the following lines, it seems gradient accumulation is not properly implemented:

https://github.com/mahmoodlab/HIPT/blob/a9b5bb8d159684fc4c2c497d68950ab915caeb7e/2-Weakly-Supervised-Subtyping/utils/core_utils.py#L285-L290

A proper implementation should look like the following:

loss = loss / gc
loss.backward()

if (batch_idx + 1) % gc == 0:
      optimizer.step()
      optimizer.zero_grad()

vildesboe commented 11 months ago

Hi, based on the following lines, it seems gradient accumulation is not properly implemented:

https://github.com/mahmoodlab/HIPT/blob/a9b5bb8d159684fc4c2c497d68950ab915caeb7e/2-Weakly-Supervised-Subtyping/utils/core_utils.py#L285-L290

A proper implementation should look like the following:
loss = loss / gc
loss.backward()

if (batch_idx + 1) % gc == 0:
      optimizer.step()
      optimizer.zero_grad()

Hi! I'm also working on reproducing this HIPT paper. Would you be interested in some discussion?

clemsgrs commented 11 months ago

sure, happy to chat. I’ve made my own version of the code here: https://github.com/clemsgrs/hipt

you can contact me at: clement (dot) grisi (at) radboudumc (dot) nl