sunshangquan / Histoformer

[ECCV 2024] Histoformer: Restoring Images in Adverse Weather Conditions via Histogram Transformer
136 stars 10 forks source link

"NaN or Inf found in input tensor" on customized training dataset #7

Closed EricLe-dev closed 2 months ago

EricLe-dev commented 2 months ago

Hi @sunshangquan,

Thank you so much for the fantastic work! I'm training your model using a customized dataset. However, during the train, I got "NaN or Inf found in input tensor" and the l_pix, l_pear become NaN. I suspect this is something in my data but it's very hard to detect, I did compared it with the training data that you used and at least, they are quite similar in the sizes. Also, I saw in your code you already implemented the check_inf_nan but not yet used it anywhere. I guess you knew this already.

def check_inf_nan(self, x):
        x[x.isnan()] = 0
        x[x.isinf()] = 1e7
        return x

Since this is how you handle the NaN and Inf, does this affect the training result? Can you please give me a guidance for this? Thank you so much!

sunshangquan commented 2 months ago

Hi @EricLe-dev , thank you for your interest in our work. Yes, when I implemented Pearson Correlation Loss, I met the problem of loss becoming NaN or Inf. So I tried this check_inf_nan function initially. But I eventually solved the problem by adding +1e-6 in L132 for avoiding zero in denominator and ~pearson.isnan()*~pearson.isinf() in L181 for avoiding NaN or Inf in the loss term.

The second solution would be equivalent to using check_inf_nan. So I guess the problem cannot be solved by check_inf_nan. But you could try it anyway. If you still have the issue, you may try just removing the Pearson Correlation Loss by removing L125-126. Hope my answer could help you.

EricLe-dev commented 2 months ago

Thank you for a very quick and detailed reply. According to what you explained, you met the same problem mostly with the l_pear. In my case, it seemed to be on l_pix as well. This is what I did and it seems to worked, at least not showing the "NaN or Inf found in input tensor".

At L186, I added a check:

if torch.isnan(self.output).any() or torch.isinf(self.output).any():
   self.ouput = self.check_inf_nan(self.output)

I did exactly the same with L189 and L195.

Your idea to solve NaN and Inf in the function check_inf_nan is brilliant. I just don't really know if clipping NaN to 0 and Infinite to 1e7 would affect the training. Additionally, can you please explain why you chose the number 1e7? As this number, from empirical perspective, seems to be very large.

sunshangquan commented 2 months ago

Glad you have made it! I guess it would work because clipping part of values would not affect other most values. Besides, the clip function for loss gradient, clip_grad_norm_, also has a argument error_if_nonfinite=False. So I suppose your solution will work as well. If not, you could just remove the Pearson Correlation Loss.

I just chose a random big number like 1e7 to replace Inf. You could feel free to change it.

EricLe-dev commented 2 months ago

@sunshangquan Thanks to your guidance, I have been able to train the model with customized training data. I have tested the trained model with 800x800px (resized from 6000x6000px) images and exactly the same image at 6000x6000 (inference on patches, then join them together to form the original high resolution image). I realized that the inference on 800x800 px images are way much better than the inference on patches then join them together to form the original, high resolution image.

I know that inferencing on smaller patches will narrow down the possibility of the model to observe the image, however, during the training, you already implemented random cropping of 128-326px, plus I tried feeding the original images (not resized) to the Dataset_PairedImage. The difference should not be this much I guess. Can you please give me a guidance or explaination for this? Thank you so much!

sunshangquan commented 2 months ago

Hi @EricLe-dev , glad for you. From my experience, it depends because inferencing on whether cropped patches or resized whole images is suitable for different tasks.

For example, for the tasks that involves global contrast changes, like dehazing (heavy rain accumulation removal), enhancement, retouching, inferencing on resized whole images is way better. It is because human eyes are sensitive to the contrast difference between neighbor patches.

For the tasks that involves only local pattern changes, like super-resolution, deblurring, deraining, sometimes it would be fine to inference on cropped patches for better local details. Human eyes may not be sensitive to the tiny difference between neighbor patches.

Another factor would be the model itself. Some models are better in global consistency which would treat the same pattern differently when the pattern is at the center, edge or corner (this is my personal guess).