Closed rish-16 closed 6 months ago
That's odd. Are you using the same version of lightning as what's in the fm.yml
? I wonder if lightning changed something so it calles len before it calls iter.
Hi, closing this for now but please reopen if you are stil running into problems.
Hey Jason and Team, thanks for the amazing repo!!
I tried to retrain on SCOPe on my setup (2 RTX3090s) and am running into this issue attached below that's causing the training to stop and crash. I also tried it with 1 GPU and it still crashed the same way.
To reproduce:
python train_se3_flows.py
(I reorganised the files a bit to make it cleaner/more manageable)I've narrowed down the issue to this line: https://github.com/microsoft/protein-frame-flow/blob/main/data/pdb_dataloader.py#L245
My guess is that the
self._create_batches()
method in L245 isn't really being called in the__iter__(...)
method; tried printing thesample_order
variable and nothing was printed (so that line isn't run at all). Do you think it's a PyTorch / Lightning issue?I've been trying to find workarounds for a while but nothing has worked yet. Appreciate any leads on this :)