time-series-foundation-models / lag-llama

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
Apache License 2.0
1.08k stars 121 forks source link

loss calculation #50

Open sudongwang-upc opened 2 months ago

sudongwang-upc commented 2 months ago

The loss calculation during training uses the probability density function loss of all targets in context and predict, rather than just predict. Why?

ashok-arjun commented 2 months ago

So, in all decoder-only models, the "predict" part actually makes no sense during training, as all the points are used for the loss. Note that this is for all decoder-only models such as GPT.

In the code, a context and predict are separately used to support inference for a certain specific prediction length, given a certain fixed length context.

sudongwang-upc commented 2 months ago

So, in all decoder-only models, the "predict" part actually makes no sense during training, as all the points are used for the loss. Note that this is for all decoder-only models such as GPT.

In the code, a context and predict are separately used to support inference for a certain specific prediction length, given a certain fixed length context.

Thank you very much for your reply. I got it!