Open jhaggle opened 7 months ago
@jhaggle
I have the same issue, I'm finding that when I set the class_weights, I get same error because the default ignore_index(255) exceeds the index of class_weights.(bacasue of class_weight[cls]
) When this section is completely replaced with the original code, there will be no error
if (avg_factor is None) and avg_non_ignore and reduction == 'mean':
avg_factor = label.numel() - (label == ignore_index).sum().item()
if weight is not None:
weight = weight.float()
I didn't understand. What should I do to avoid this CUDA Error? @talebolano
I have searched related issues but cannot get the expected help.
I follow this tutorial:
https://github.com/open-mmlab/mmsegmentation/blob/main/demo/MMSegmentation_Tutorial.ipynb
If I follow the tutorial completely unchanged it works fine.
I then try to add Class Balanced Loss as in this tutorial:
https://mmsegmentation.readthedocs.io/en/latest/advanced_guides/training_tricks.html#class-balanced-loss
I therefore add this line to the cell in the tutorial where the config is modifed:
cfg.model.decode_head.loss_decode.update(dict(class_weight=[0.1, 0.1, 0.7, 0.1, 0.1, 0.1, 0.7, 0.1]))
HOWEVER, this results in this cuda error:
If I add
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
to the code in accordance with the cuda error I instead get this:The number of classes I have set should be right. If I try to change it to something else I instead get this error:
How can I avoid this error? And what is causing it?