forecasts = list(forecast_it) is very slow

time-series-foundation-models / lag-llama

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

Apache License 2.0

1.08k stars 121 forks source link

forecasts = list(forecast_it) is very slow #63

Open xuyilin0121 opened 1 month ago

xuyilin0121 commented 1 month ago

Hi there,

I have a very interesting problem here when I want to test the model using my data. The original dataset has 5835 rows, i.e., 5835 time series and It includes 39 timesteps. I understand it is backtested, so I set the context length to 32 and the prediction length to 7. Everything goes well until the last step forecasts = list(forecast_it). Based on my observation, if I input 100 time series, it takes at least 1 minute for conversion. Thus I suppose 5835 time series will need hours. The interesting thing is I have a DeepAR model before which uses GlutonTS package as well and it only needs like 10 minutes at most for converting the same dataset. I tried to do the research but no helpful information can be found...so I raise the issue to see whether there are any difference between DeepAR result and Lag-llama Result which makes it slow for converting to list.

Thanks for your help!

ashok-arjun commented 1 month ago

Hi,

It's probably because Lag-Llama takes more time on average to forward-pass a single batch, as it's a bigger model that the DeepAR model you are using. Maybe you can try benchmark the time for 1 series, with the same context and prediction length; then you'd know if this is the case.

What's the batch size you're using for Lag-Llama?

xuyilin0121 commented 1 month ago

Hi,

Thanks so much for your reply!

I have tried the 1 series with tqdm when converting, and the result can be found below:

I am using batch_size = 64, but when I set it to 1, the training is faster

Moreover, I have to say I am not technical enough, because I think it makes more sense if the model takes more time for training or prediction instead of converting the "generator object PyTorchPredictor.predict" since I do see any difference between the list result of DeepAR and Lag-llama. Could you kindly help me with more explanation on this? Thanks!!!

simona-0 commented 3 weeks ago

Hi @xuyilin0121 , I came across the same problem when I tried to fine-tune lag-llama using a training dataset of ~700 observations (each with ~500 timestamps). The batch size I used is 64. The list() conversion took hours. Have you maybe found ways to deal with this issue? Thx

CoCoNuTeK commented 2 weeks ago

Same even with num_samples=5 it takes at least 30 seconds to conver with batch size of 4 and each sequence of lenght 1024 with pred len 512 if i set it took 100 it would take hours, so is there any new fix to this as this model wont be usable if i have to wait horus for one batch to finish... The interesting thing is that the inference is very fast bu the conversion to list is super long......... Doing inference on GPU.

ashok-arjun commented 2 weeks ago

@CoCoNuTeK Thanks for describing the problem. Can you explain what you mean by inference is very fast but the conversion to list is super long?

@xuyilin0121 @simona-0 I am not sure as well. I'll be happy to take a look at this next week myself and fix it. I'll keep this thread updated

CoCoNuTeK commented 2 weeks ago

So this code

        log_info("Starting inference...")
        check_memory_usage()

        forecast_it, ts_it = make_evaluation_predictions(
            dataset=batch,
            predictor=predictor,
            num_samples=num_samples
        )
        log_info("Inference completed, converting to list...")

is almost instant, however this part

        forecasts = list(forecast_it)
        tss = list(ts_it)

takes way too long, if i had 5 num samples it took arround 30 seconds but i am pretty sure it didnt scale linearly but worse as with 100 it didnt even finish my setup is nvidia tesla t4 GPU cuda is setup SEQ_LEN = 512 # Context length for the model PRED_LEN = 512 # Horizon length for the model BATCH_SIZE = 32 NUM_SAMPLES=100

ashok-arjun commented 1 week ago

Yes, the first block of code is supposed to just "create" the generators. The second part is what actually runs the inference.

We recently added support for deterministic point-forecasting, where only the mean of the forecast, and the forward pass is much faster since it uses just one sample as the previous prediction. This is supported by enabling use_single_pass_sampling when creating the Estimator. Can you try this, and check if it works OK for your use case?

@CoCoNuTeK @xuyilin0121 @simona-0