time-series-foundation-models / lag-llama

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
Apache License 2.0
1.08k stars 121 forks source link

Sub-second data with various frequencies #72

Open lhtfb opened 3 weeks ago

lhtfb commented 3 weeks ago

Hi! Thank you a lot for your work! I have a small question regarding lag frequencies. https://ts.gluon.ai/v0.11.x/_modules/gluonts/time_feature/lag.html#get_lags_for_frequency

As I understand this article, the smallest frequency is 1 second. However, is it possible to go into the millisecond range by any chance?

Additionally, what should I do if my data is not constant, like it has lags of 0.3s, 0.5s, 1.2s, 0.5s, 0.8s, etc.? Rearranging data to a constant lag might significantly reduce quality.

ashok-arjun commented 1 week ago

Hi @lhtfb ! Apologies for the delayed response.

  1. Yes, it is possible to construct lags for higher frequencies (millisecond/nanosecond), but since gluonts does not support it, you would have to write a custom wrapper to use it for those ranges. Note that to construct lags based on second frequency, gluon uses the following code as in the link you provide:
    elif offset_name == "S":
        lags = (
            _make_lags_for_second(offset.n)
            + _make_lags_for_minute(offset.n / 60)
            + _make_lags_for_hour(offset.n / (60 * 60))
        )

which is basically constructing lags for the "second" seasonality (60 seconds, 120 seconds etc.), "minute" seasonality lags (60 minutes, 120 minutes etc. but returning the appropriate lag index for the "second-frequency" data), hourly seasonality lags (24 hours, 48 hours etc. but returning the appropriate lag index for the "second-frequency" data). What you really need is the "second frequency" lags, but Gluon gives you long-term lags also based on other seasonalitites which might exist in your data. You might or might not stick with this approach when writing the wrapper for the millisecond frequency.

You might want to write something like this (I have not tested this):

    def _make_lags_for_millisecond(multiple, num_cycles=3):
        return [
            _make_lags(k * 1000 // multiple, 1) for k in range(1, num_cycles + 1) # You may change the second argument of _make_lags if you'd like
        ]

    elif offset_name == "L":
        # "L" represents millisecond
        lags = (
            _make_lags_for_millisecond(offset.n)
            + _make_lags_for_second(offset.n / 1000) # This is optional though as I said
            + _make_lags_for_minute(offset.n / (1000 * 60)) # This is optional though as I said
        )
  1. Even when your data is not regular, you can use lag-llama as is, and check performance. The lags will not be as meaningful anymore as they assume regularity. In case it does not work, we do not support getting the lags for irregular data for now :(