state-spaces / s4

Structured state space sequence models
Apache License 2.0
2.47k stars 296 forks source link

Passing a video to S4ND #146

Open HadiHammoud44 opened 5 months ago

HadiHammoud44 commented 5 months ago

Hello, My question might seem too naive but I'm confused by the arguments passed to the S4ND model (dim, d_model, d_state, channels, out_channels, d_output, ...) and the expected input shape. Assume I have a video of shape (batch_size, nb_frames, nb_channels, height, width) = (1, 30, 3, 128, 128), how should I reshape the video to pass it to the S4ND model, and what arguments of the model should be adjusted accordingly? Also, if the desired output is a sequence of labels of length=nb_frames (so that each image in the video will get a label), which argument should be adjusted? d_output or out_channels or...? I would highly appreciate your prompt response.