Question 1:

How do input 16 frames of size 456x456 into the EfficientNet model?

I am trying to classify 16 frame snippets from video clips.


    #load model
    from efficientnet_pytorch_3d import EfficientNet3D

    model_EfficientNet3D = EfficientNet3D.from_name("efficientnet-b7", in_channels=3)
    summary(model_EfficientNet3D, input_size=(3, 16, 456, 456))

I have 16 images I want to send into the EfficientNet3D, is this possible?

A similar comment was made by @shijianjian here : https://github.com/shijianjian/EfficientNet-PyTorch-3D/issues/11#issuecomment-985999483

"Say, change from Conv3D(kernel_size=(3, 3, 3)) to Conv3D(kernel_size=(1, 3, 3))will probably work for your case."

I am very lost here because I dont understand where to actually change this code.

I cant even find this specific code in the model file: Conv3D(kernel_size=(3, 3, 3))

Also since I am using the pip install efficientnet-pytorch for this EfficientNet3D, I am having trouble understanding how to manipulate the actual model code since its a pip install.

If I was to manually load the efficientnet-pytorch model with PyTorch, where and how would I be able to load the model weights?

Please help me in any way, this is a wonderful project and I am grateful for the contribution. Just need a bit of support on loading the model.

Question 2.

How can I use this 3D-EfficientNet model as a backbone feature extractor? I would need to export features at a certain layer instead of getting a final classification.

Thanks so much !!

shijianjian commented 2 years ago

https://github.com/shijianjian/EfficientNet-PyTorch-3D/blob/3e79bcd06216b2e831bf3300fff9636cce2cd0d1/efficientnet_pytorch_3d/model.py#L129

Here, kernel_size=3 means kernel_size=(3, ,3, 3). You may update the corresponding code. Same for pooling layers, etc, if you need.

Plus, I do not think there is a pretrained 3D model in the wild now. If you have one, you have to make sure the model architecture is as same as in this repo. It will be a bit overwhelm if you did not train your model with this repo.

shijianjian commented 2 years ago

The input shape shall be BxCxDxHxW, where D means depth.

shijianjian / EfficientNet-PyTorch-3D

Input shape dimensions = C x T x H x W ? #16

Question 1:

Question 2.