Closed aneesurhashmi closed 1 year ago
Hello, thanks for the kind words. It seems that the issue might be related to the indexing of the val_l1_loss array, which is causing an IndexError when trying to access an element that is out of bounds for the specified axis.
Regarding the question about the use of range(0, args.num_epoch+1) in the train_syndiff function, the +1 is being used to ensure that the training loop runs for the specified number of epochs, including the final epoch.
If you're experiencing indexing errors in the last epoch, it's possible that there is an issue with the indexing scheme or with how the val_l1_loss array is being initialized or updated. You can increase the second dimension (axis=1) by one, which would solve the problem. Alternatively, removing the last epoch from the loop may be a temporary workaround to avoid indexing errors.
Hi, Thank you for this amzing work. The train_syndiff function runs the training loop for range(0, args.num_epoch+1), this causes some indexing errors in the last epoch. I removed the last epoch to check whether this is the issue, and it seems to work. Can you please confirm what was the reason behind using epoch+1 here?
Thank you
Traceback (most recent call last): File "/home/anees.hashmi/Desktop/SynDiff/train.py", line 867, in
init_processes(0, size, train_syndiff, args)
File "/home/anees.hashmi/Desktop/SynDiff/train.py", line 730, in init_processes
fn(rank, gpu, args)
File "/home/anees.hashmi/Desktop/SynDiff/train.py", line 694, in train_syndiff
val_l1_loss[0,epoch,iteration]=abs(fake_sample1 -real_data).mean()
IndexError: index 1 is out of bounds for axis 1 with size 1