state-spaces / s4

Structured state space sequence models
Apache License 2.0
2.39k stars 285 forks source link

Usage of bandlimit parameter in S4D #123

Closed NikolaZubic closed 11 months ago

NikolaZubic commented 11 months ago

How to use bandlimit parameter in S4D model? https://github.com/HazyResearch/state-spaces/blob/main/models/s4/s4.py#L1091

Let's say we are training with data where temporal resolution is high, spatial one is not.

What is the best strategy here: We have for example 500k steps of training on 20 Hz data.

Do we train without bandlimiting, and then during inference if we have 40 Hz we change rate and bandlimit accordingly? So, changing bandlimit only during inference. OR We set bandlimit to 0.2 during training and that's it. OR We train for example 80% of steps on 20 Hz data with bandlimiting 0.2, and then 20% of steps on 40 Hz or 80 Hz. Then, during inference, we turn off bandlimiting parameter or keep it, of course rate scales accordingly always?

albertfgu commented 11 months ago

To be honest, it's been a while and I forget the exact way this parameter should be used. I should have documented it better.

From looking at the code and paper, I think you set $\alpha$ to a constant at both train and inference time. The cutoff is scaled correctly automatically if you pass in things at different resolution (note that the rate parameter does have to be adjusted correctly; if the "base" frequency is 20Hz, then when you run at 40Hz it should pass in rate=0.5). The value of $\alpha$ theoretically should be 1.0 but empirically is often best smaller; IIRC around $\alpha=0.5$ or $\alpha=1.0$ is good for S4D, while an even lower $\alpha$ of $0.1$ or $0.2$ or so is better for S4-DPLR.

Training in multiple stages while changing resolution is of course also possible, and in that case you wouldn't need to set a bandlimit at all if your testing resolutions are all seen during training.

NikolaZubic commented 11 months ago

Thank you for the fast reply.