Closed ethio-artifical closed 5 months ago
We use the 3D resnet as our backbone, and always set the temporal kernel size as 1. So it always conduct 2D convolutions to extract spatial features.
What is the use of using 3D ResNet as backbone why don't you just use 2D ResNet ???
We previously test using temporal kernel sizes > 1 and see worse results, and thus use the ResNet as 3D architectures for convenience. We now set the temporal kernel sizes as 1 to make it behave exactly as a 2D backbone.
Thank a lot My Friend for you'r effort you put for my question and for you'r fast replay alot of thanks from the Bottom of my hear??
i got it thank you alot and i want to try out GRU and BIGRU how to do this in you'r code
You may directly replace the temporal_model in the slr_network.py with the GRU and BiGRU as you want.
Hello, How are you. First Thank You For Fast Replay on The Issue Thanks alot. You got Me Every Time on The Track.
My Question today is on architecture and Design you say on paper " you first employs a feature extractor (2D CNN) to capture frame-wise features, and then adopts a 1D CNN and a BiLSTM to perform short-term and long-term temporal modeling, respectively, followed by a classifier to predict sentences."
but When i print the model i got 3D CNN on feature Extractor. I get differ in paper and code in my understanding correct me if i wrong. Can You Understand You architecture you use and How you Use or Combine 3D CNN with 2D CNN?