Open kenadianu opened 2 weeks ago
I think the line you're referring to is here: https://github.com/time-series-foundation-models/lag-llama/blob/main/lag_llama/model/module.py#L552. Can you please check if you can find the error? Can you please post the exception you encounter too?
Please provide me a reproducible Colab notebook if you cannot find the error.
Yes, it is line you mention.
I am using Windows 7 while Colab uses Debian, most likely the premature stopping that happens on my side will not be reproduced there.
The console output comes from running run.py as follows
fawcett10@fawcett10-PC MINGW64 /g/noi5/LagLlama-timeSeries.github/lag-llama-main
$ C:/Users/.../AppData/Local/Programs/Python/Python310/python.exe run.py --experiment_name pretraining_lag_llama --results_dir G:/noi5/LagLlama-timeSeries.github/lag-llama-main/experiments/results
Just in any case you might have another suggestion I am attaching a screenshot showing the console output at the stopping moment along with the code excerpt around the line in question.. If not, we may consider closing this issue. Thank you.
Thanks for attaching the screenshot. Can you try running it on CPU? It could be an issue with using GPU on your side.
Thank you for your suggestion on using cpu instead of gpu. It was already tried though... - generating the trace messages shown in screenshots above.
Specifically, the gpu was changed to cpu in these three files ( the relevant instructions shown below)
\lag-llama-main\lag_llama\gluon\estimator.py \lag-llama-main\run.py \AppData\Local\Programs\Python\Python311\Lib\site-packages\gluonts\torch\model\predictor.py
If there are other places where I can try to change the gpu to cpu please let me know.Thank you.
Sorry, I'm not sure. We haven't tested the code locally on Windows 7 systems.
Would you be able to use a different Debian/Linux system by any chance, or is making it work on Windows 7 crucial in your case?
Also, do other PyTorch models work well on your system?
Thank you Arjun for the great suggestion! Indeed, the problem was with PyTorch . Lag-LLama is now working on my Windows 7 with PyTorch v2.2.1, while with v2.3.0 it did not.
Details: Trying to reproduce the experiment in https://theaveragecoder.medium.com/training-and-testing-a-basic-neural-network-using-pytorch-4010300fda45 That project uses the torchvision package, that was not installed on my computer. From multiple available versions I chose the one released on Feb 22, 2024, version 0.17.1 In its turn, torchvision 0.17.1 required pytorch version 2.2.1 while, by default, the version installed with requirements.txt was 2.3.0 The torchvision installation process automatically downgraded pytorch to version 2.2.1
Snapshots from console during running Lag-LLama's run.py
Trying to test the prediction with the minimal code from
https://github.com/marcopeix/time-series-analysis/blob/master/lag_llama.ipynb https://medium.com/@odhitom09/lag-llama-an-open-source-base-model-for-predicting-time-series-data-2e897fddf005
The script stops when the forecasts list is to be created, with no errors reported on the console. Can you help?
By following the code (with print() added in the sources), I went as deep as this: \lag-llama-main\lag_llama\model\module.py
Is there a lag_llama log file(or pytorch, gluonts, etc) I can look into for more info, and dig further? Or can you point me where to look further in the code? Thank you.