Closed jambo6 closed 2 years ago
Hi, Thanks for opening that issue. What kind of dataset are you considering, i.e., seq. length, type of data? Using alpha=9.0 is kind of an extreme case, which seemed to work well for healthcare RR. If you try another dataset, I would start with a default value of alpha=1.0, and dt=1/seq_length. This should not blow up. Also: you might want to consider using at least 2 layers (otherwise there is very little expressivity).
The example above is just using random data to explain the issue.
If I try
data = torch.randn(10, 50, 3)
model = UnICORNN(3, 10, 1, 1/50, 1, 2)
print(model(data))
which I believe is according to what you have said above, I get outputs
tensor([[ 6.0533e-19],
[ 7.1286e-13],
[ nan],
[-5.4613e-35],
[ 2.6288e-21],
[-1.0877e-04],
[ 6.0554e-19],
[-1.3889e-18],
[ 7.7548e-11],
[-8.3878e-20],
[ 2.8452e-38],
So I have nans
knocking about for some reason.
Okay I found the issue: you forgot to load the model (and data) to the gpu, i.e., your first example should be
data = torch.randn(10, 50, 3).cuda()
model = UnICORNN(3, 10, 1, 0.011, 9.0, 1).cuda()
print(model(data))
Doing so also in your second example gives very reasonable outputs. Please note that the provided code does only work on GPUs (as the recurrent part is implemented directly in plain cuda).
One more note: Please make sure you feed the input data to the model in the following shape: [seq_length x batch_size x inp_dim]. Otherwise you will not get reasonable results in your experiments. If you have questions regarding fine-tuning UnICORNN on your own datasets, please let me know -- I'm always happy to help.
Feel free to close it, if that solves your issue.
Hi, that does indeed work now, thanks!
Hello,
I'm interested in trying this model, but I'm finding that with standard defaults I get an immediate solution blowup.
For example, if I consider the implementation in
health_care/network.py
and run the following codewhich are the dt and alpha defaults for the health care task, I get