Open jchia opened 1 year ago
Is it the case that models/s4/s4.py is provided mainly for convenience and ease-of-use from having just one source file, and that it is not meant to have all the features available from src/models/sequence/kernels/ssm.py?
Yes, the standalone files are meant for convenience. The models inside this repo's training infrastructure are very modular, which means they are factored over a large number of files and would be inconvenient to copy to external repositories.
In terms of the development process, is it the case that models/s4/s4.py is downstream of src/models/sequence/kernels so that changes go to the latter first and then get manually ported to the former?
That's right
In terms of software behavior (the values that are calculated mathematically ignoring floating-point error and random seed differences), is models/s4.py meant to do the same thing as src/models/sequence/kernels/ssm.py for the use cases that it covers?
They should do the exact same thing. It's conceivable that there are edge cases in the standalone because it is much less tested.
Which version (standalone vs non-standalone) of the S4 implementation is generally used for the experiments in the papers?
All experiments use the original version, not the standalone. There are several READMEs in this repo documenting the full training pipeline and model structure (e.g. https://github.com/HazyResearch/state-spaces/tree/main/src/models/sequence)
I understand that models/s4/s4.py is a standalone file that can be taken on its own to use the S4 models, not counting the CUDA kernel module. I have some questions about its intent and nature relative to the code in src/models/sequence/kernels/
Is it the case that models/s4/s4.py is provided mainly for convenience and ease-of-use from having just one source file, and that it is not meant to have all the features available from src/models/sequence/kernels/ssm.py?
In terms of the development process, is it the case that models/s4/s4.py is downstream of src/models/sequence/kernels so that changes go to the latter first and then get manually ported to the former?
In terms of software behavior (the values that are calculated mathematically ignoring floating-point error and random seed differences), is models/s4.py meant to do the same thing as src/models/sequence/kernels/ssm.py for the use cases that it covers?
Which version (standalone vs non-standalone) of the S4 implementation is generally used for the experiments in the papers?