Open darshvirbelandis opened 2 years ago
Here, kernel_size=3
means kernel_size=(3, ,3, 3)
. You may update the corresponding code. Same for pooling layers, etc, if you need.
Plus, I do not think there is a pretrained 3D model in the wild now. If you have one, you have to make sure the model architecture is as same as in this repo. It will be a bit overwhelm if you did not train your model with this repo.
The input shape shall be BxCxDxHxW, where D means depth.
Channel x Time(or NumFrames) x Height x Width
I am attempting to load my model in the following format
Question 1:
How do input 16 frames of size 456x456 into the EfficientNet model?
I am trying to classify 16 frame snippets from video clips.
I have 16 images I want to send into the EfficientNet3D, is this possible?
A similar comment was made by @shijianjian here : https://github.com/shijianjian/EfficientNet-PyTorch-3D/issues/11#issuecomment-985999483
"Say, change from Conv3D(kernel_size=(3, 3, 3)) to Conv3D(kernel_size=(1, 3, 3))will probably work for your case."
I am very lost here because I dont understand where to actually change this code.
I cant even find this specific code in the model file: Conv3D(kernel_size=(3, 3, 3))
Also since I am using the pip install efficientnet-pytorch for this EfficientNet3D, I am having trouble understanding how to manipulate the actual model code since its a pip install.
If I was to manually load the efficientnet-pytorch model with PyTorch, where and how would I be able to load the model weights?
Please help me in any way, this is a wonderful project and I am grateful for the contribution. Just need a bit of support on loading the model.
Question 2.
How can I use this 3D-EfficientNet model as a backbone feature extractor? I would need to export features at a certain layer instead of getting a final classification.
Thanks so much !!