Open turmeric-blend opened 3 years ago
thanks for letting me know.. perhaps i screwed something up while refactoring, I'll check and get back to you
@turmeric-blend which parameters did you give the model? I remember that when I set the beta_end
to be high I got nan
s...
my beta_end=0.07
, I ran everything the same as the notebook example.
thanks! I just re-ran it again on my machine and it all worked out... very strange... 🤔
I am running from the downloaded zip folder (didn't install via pip install pytorchts
). Not sure if this affects anything.
hmm not sure... perhaps in the downloaded zip folder do: pip install .
and then try?
I feel like its related to a random seed since we are using different machines....
also which version of pytorch to you use? I am using pytorch 1.7.1 here
pytorch 1.7.1+cu110
I tried with pip install .
, nan still occurs, maybe you could update the results with fixed seed for pytorch,mxnet,numpy,random ... etc, and I will see if I can reproduce it?
I can try sure.. I just re-ran it again with the parameters from the paper and checked in the notebook, I also fixed the "cuda" device name...
my cuda settings is like:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
but this is just because I don't have a GPU on 1
.
ok so setting a seed e.g. via:
np.random.seed(123456)
torch.manual_seed(123456)
also works for me and I get no nan
in training... can you try to perhaps train with less num_workers
e.g. 4 or 2 or even 0?
nothing works :/ I reinstalled everything in a clean env and still doesn't work.
I installed gluonts
via pip install git+https://github.com/awslabs/gluon-ts.git@master#egg=gluonts
.
If I try to install PyTorchTS via pip install pytorchts
, I get this error:
ERROR: Packages installed from PyPI cannot depend on packages which are not also hosted on PyPI. pytorchts depends on gluonts@ git+https://github.com/awslabs/gluon-ts.git@master#egg=gluonts
sorry to hear that... i will try to reproduce on a clean env as well!
I have the same results NaN
, and I found that the outputs are NaN
. It is very strange.
can you try to use the 0.7.0
branch?
I am unable to reproduce results from TimeGrad Notebook. I am getting diverging loss into NaN loss.
predictor = estimator.train(dataset_train, num_workers=8)