time-series-foundation-models / lag-llama

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
Apache License 2.0
1.08k stars 121 forks source link

forecasts = list(forecast_it) not performed #79

Open kenadianu opened 2 weeks ago

kenadianu commented 2 weeks ago

Trying to test the prediction with the minimal code from

https://github.com/marcopeix/time-series-analysis/blob/master/lag_llama.ipynb https://medium.com/@odhitom09/lag-llama-an-open-source-base-model-for-predicting-time-series-data-2e897fddf005

The script stops when the forecasts list is to be created, with no errors reported on the console. Can you help?

[... code as in the links above...]
print("trace 40") # my tracing

# generate zero-shot predictions
forecast_it, ts_it = make_evaluation_predictions(
    dataset=backtest_dataset,
    predictor=predictor,
)

print("trace 47: forecast_it=",forecast_it) #  <generator object PyTorchPredictor.predict at 0x000000001646EDF0>
print("trace 48: type(forecast_it)=",type(forecast_it)) # <class 'generator'>

try:
  print("trace 62")   
  forecasts = list(forecast_it) # runs for some time then stops, nothing being displayed on console ...
  print("trace 64  not reached")   
except:
  print("An exception occurred")
else:
  print("Nothing went wrong")

By following the code (with print() added in the sources), I went as deep as this: \lag-llama-main\lag_llama\model\module.py

print("class LagLlamaModel, forward: self.transformer.wte=",self.transformer.wte)   # =  Linear(in_features=92, out_features=144, bias=True)

# forward the LLaMA model itself
x = self.transformer.wte(  # dies here
            transformer_input
)  
print("not reached")

Is there a lag_llama log file(or pytorch, gluonts, etc) I can look into for more info, and dig further? Or can you point me where to look further in the code? Thank you.

ashok-arjun commented 2 weeks ago

I think the line you're referring to is here: https://github.com/time-series-foundation-models/lag-llama/blob/main/lag_llama/model/module.py#L552. Can you please check if you can find the error? Can you please post the exception you encounter too?

Please provide me a reproducible Colab notebook if you cannot find the error.

kenadianu commented 2 weeks ago

Yes, it is line you mention.

I am using Windows 7 while Colab uses Debian, most likely the premature stopping that happens on my side will not be reproduced there.

The console output comes from running run.py as follows

fawcett10@fawcett10-PC MINGW64 /g/noi5/LagLlama-timeSeries.github/lag-llama-main
$ C:/Users/.../AppData/Local/Programs/Python/Python310/python.exe run.py --experiment_name pretraining_lag_llama --results_dir G:/noi5/LagLlama-timeSeries.github/lag-llama-main/experiments/results

Just in any case you might have another suggestion I am attaching a screenshot showing the console output at the stopping moment along with the code excerpt around the line in question.. If not, we may consider closing this issue. Thank you. transformer wte

ashok-arjun commented 1 week ago

Thanks for attaching the screenshot. Can you try running it on CPU? It could be an issue with using GPU on your side.

kenadianu commented 1 week ago

Thank you for your suggestion on using cpu instead of gpu. It was already tried though... - generating the trace messages shown in screenshots above.

Specifically, the gpu was changed to cpu in these three files ( the relevant instructions shown below)

\lag-llama-main\lag_llama\gluon\estimator.py \lag-llama-main\run.py \AppData\Local\Programs\Python\Python311\Lib\site-packages\gluonts\torch\model\predictor.py

If there are other places where I can try to change the gpu to cpu please let me know.Thank you. gpu to cpu

ashok-arjun commented 1 week ago

Sorry, I'm not sure. We haven't tested the code locally on Windows 7 systems.

Would you be able to use a different Debian/Linux system by any chance, or is making it work on Windows 7 crucial in your case?

Also, do other PyTorch models work well on your system?

kenadianu commented 1 week ago

Thank you Arjun for the great suggestion! Indeed, the problem was with PyTorch . Lag-LLama is now working on my Windows 7 with PyTorch v2.2.1, while with v2.3.0 it did not.

Details: Trying to reproduce the experiment in https://theaveragecoder.medium.com/training-and-testing-a-basic-neural-network-using-pytorch-4010300fda45 That project uses the torchvision package, that was not installed on my computer. From multiple available versions I chose the one released on Feb 22, 2024, version 0.17.1 In its turn, torchvision 0.17.1 required pytorch version 2.2.1 while, by default, the version installed with requirements.txt was 2.3.0 The torchvision installation process automatically downgraded pytorch to version 2.2.1

Snapshots from console during running Lag-LLama's run.py Epoch #0