Thanks for sharing a fairly complete repo of NFs. What kind of assumptions the code considers in the case of 2D that does not hold in ND? I have a dataset of 100D and the loss is nan all the time.
I guess this is related to the ActNorm which becomes nan. Also there is relation between the batch size and the dimension of the data. Do you have any comment on that?
Thanks for sharing a fairly complete repo of NFs. What kind of assumptions the code considers in the case of 2D that does not hold in ND? I have a dataset of 100D and the loss is nan all the time.