time-series-foundation-models / lag-llama

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
Apache License 2.0
1.22k stars 147 forks source link

Problem on test set inference on a single machine with multiple GPUs #105

Open mayii2001 opened 22 hours ago

mayii2001 commented 22 hours ago

Hi, thank for your great work! Your code is based on the lightning torch. When i deployed the model on a single machine with multiple GPUs, it started several GLOBAL processes, which is necessary for training acceleration but raises a problem when testing. I planned to load a test set with a length of 1k for example, while the predictive results appeared to be with a quadruple length (using make_evaluation_predictions() ). I think it is the biggest reason for my very very slow inference which didnt happen on validation set. The document of lightning recommends using trainer(device=1) to test. I tried initializing a new trainer like below but raised a TypeError: model must be a LightningModule or torch._dynamo.OptimizedModule, got LagLlamaLightningModule. I dont know how to fix it now.

model = LagLlamaEstimator()
single_device_trainer = Trainer(devices=1,max_epochs=1)
pre_results=single_device_trainer.test(model=model.network, dataloaders=test_loader)
ashok-arjun commented 8 hours ago

Hi, I'm not sure about lightning. Predictions are very slow with our model and even slower if you increase the pred length.