Open csssuf opened 9 years ago
Will take a look...
Added -opencl 1
option to sample.lua, https://github.com/hughperkins/char-rnn/commit/728d8cbcc04f0e1fa99d6b885faf9500b6905426 , which you can get from https://github.com/hughperkins/char-rnn , prior to merge. Note that I'm getting nans out currently, but that might just be because I havent trained for very long? (edit: when I say nan
s, I mean I get an error about multinomial summing to <= 0, but thats because it sums to nan
)
I'm seeing that as well. Thanks for the fix!
Ok. I will dig a bit...
(By the way, do you get nans for train_loss during training? or only during sampling?)
I do also get nans for train_loss during training.
ah. ok. thats different from me. But I do have access to an AMD, which gives nans. Anyway, I will dig a bit...
Seems there are two issues:
(for the nans during training, on amd, its a different issue, which I need to address)
(Note: have to update to the latest version of cltorch, ie commit 48ca96fac or above. I guess you can just type something like luarocks install cltorch
to upgrade it?)
Seems to be working for me, and i'm no longer seeing the NaNs during training, either. Thanks again!
i'm no longer seeing the NaNs during training, either.
oh! Interesting! :-)
Hmmm, right :-) No more nans on the AMD device here either :-)
Can we leave this open for now, in case other people encounter the same issue?
thanks :-)
I think this can be closed now, since the change has been merged for a while now.
It seems to be impossible to sample a checkpoint that has been trained with OpenCL, since
sample.lua
assumes either CPU trained data or CUDA trained data. Attempting to sample an OpenCL trained checkpoint by explicitly settinggpuid
falls back to CPU mode since I do not have the CUDA packages installed, as I am using an AMD card.