microsoft / Swin3D

A shift-window based transformer for 3D sparse tasks
MIT License
212 stars 19 forks source link

model.eval causing nan values #17

Open hpc100 opened 1 year ago

hpc100 commented 1 year ago

Thanks for sharing your work ! @Yukichiii @yuxiaoguo I tried to test your code on Semantic3D : In validation step, i get "nan" value in output.

image

Do you have any idea where the problem could come from (layer norm, ....) ?

jaswanthbjk commented 1 year ago

Found any solution?

hpc100 commented 1 year ago

No. @jaswanthbjk do you have the same problems ?

jaswanthbjk commented 1 year ago

Yes, I had the same problem.

But not anymore when I included normals in the features.

hpc100 commented 1 year ago

@jaswanthbj So, you didn't get nan values with XYZ+RGB+Normals or it's for XYZ + Normals ? I tried both, and get nan values when model is set to eval mode. Which points clouds do you use for validation ? (me : domfountain_station1_xyz_intensity and untermaederbrunnen_station3_xyz_intensity) Have you tried intensity features ?

jaswanthbjk commented 1 year ago

@hpc100

Sorry for the confused reply,

I am still gettinig nans in eval mode. But not during training with RGB + Normals + XYZ, which is super weird for me.

hpc100 commented 1 year ago

Found any solution ? @jaswanthbj Have you tried to evaluate the model on cpu // other GPU ?

jaswanthbjk commented 1 year ago

No, However, I run, It's resulting in nan values.

The dataloader is very different between train and val. Maybe digging around that might help solve the issue.

Yukichiii commented 1 year ago

The Nan value may be caused by half-precision. Could you please try to forward the model with full-precision? You can set fp16_mode=0 and use_amp=False.