tk-rusch / unicornn

Official code for UnICORNN (ICML 2021)
27 stars 3 forks source link

Extremely fast blowup of the output #3

Closed jambo6 closed 2 years ago

jambo6 commented 2 years ago

Hello,

I'm interested in trying this model, but I'm finding that with standard defaults I get an immediate solution blowup.

For example, if I consider the implementation in health_care/network.py and run the following code

data = torch.randn(10, 50, 3)
model = UnICORNN(3, 10, 1, 0.011, 9.0, 1)
print(model(data))

which are the dt and alpha defaults for the health care task, I get

[ 3.2111e+18],
[-7.2745e+00],
[-2.6744e+23],
[ 4.5934e+32],
[ 2.4417e+28],
[ 4.3936e+27],
[-1.5807e+27],
...
tk-rusch commented 2 years ago

Hi, Thanks for opening that issue. What kind of dataset are you considering, i.e., seq. length, type of data? Using alpha=9.0 is kind of an extreme case, which seemed to work well for healthcare RR. If you try another dataset, I would start with a default value of alpha=1.0, and dt=1/seq_length. This should not blow up. Also: you might want to consider using at least 2 layers (otherwise there is very little expressivity).

jambo6 commented 2 years ago

The example above is just using random data to explain the issue.

If I try

data = torch.randn(10, 50, 3)
model = UnICORNN(3, 10, 1, 1/50, 1, 2)
print(model(data))

which I believe is according to what you have said above, I get outputs

tensor([[ 6.0533e-19],
        [ 7.1286e-13],
        [        nan],
        [-5.4613e-35],
        [ 2.6288e-21],
        [-1.0877e-04],
        [ 6.0554e-19],
        [-1.3889e-18],
        [ 7.7548e-11],
        [-8.3878e-20],
        [ 2.8452e-38],

So I have nans knocking about for some reason.

tk-rusch commented 2 years ago

Okay I found the issue: you forgot to load the model (and data) to the gpu, i.e., your first example should be

data = torch.randn(10, 50, 3).cuda()
model = UnICORNN(3, 10, 1, 0.011, 9.0, 1).cuda()
print(model(data))

Doing so also in your second example gives very reasonable outputs. Please note that the provided code does only work on GPUs (as the recurrent part is implemented directly in plain cuda).

One more note: Please make sure you feed the input data to the model in the following shape: [seq_length x batch_size x inp_dim]. Otherwise you will not get reasonable results in your experiments. If you have questions regarding fine-tuning UnICORNN on your own datasets, please let me know -- I'm always happy to help.

tk-rusch commented 2 years ago

Feel free to close it, if that solves your issue.

jambo6 commented 2 years ago

Hi, that does indeed work now, thanks!