Open Gilgamesh666666 opened 3 years ago
That's weird. prepool
is just a simple MLP which shouldn't lead to nan's (unless the weights or inputs are nan's). I'm not sure what might be the cause since the provided code does train stably on the ModelNet40 dataset.
Are your point clouds of similar spatial extents and density? The sampling and grouping layer, i.e. sample_and_group_multi()
needs to sample a reasonable number of points to compute the features.
Hi yewzijian:
I am trying to train the model on my own datasets, but the model output 'nan' when training several epoch. I find the 'nan' are always coming from this line (
) the 0.weight of the
self.prepool
module will be 'nan' tensor after several epoch, i cilp the gradients but it seems make no sense.I will appreciate if you can give me some advices.
Hi @Gilgamesh666666, I encountered the same nan
problem. Maybe reducing the lr solves the problem.
Hi yewzijian:
I am trying to train the model on my own datasets, but the model output 'nan' when training several epoch. I find the 'nan' are always coming from this line (https://github.com/yewzijian/RPMNet/blob/c37e68730ac3493f2954c67c16208e98d21547e2/src/models/feature_nets.py#L197) the 0.weight of the
self.prepool
module will be 'nan' tensor after several epoch, i cilp the gradients but it seems make no sense.I will appreciate if you can give me some advices.