I wanted to be able to input larger image resolutions. However when I do input image size of 480*480 it takes almost 10 minutes to process a tiny 10 second clip.
It seems when I increase image size, the model inference run-time become exponentially greater.
There is crucial motion information being lost when I downscale my images to 112*112 and it is effecting the precision of the model on my test sets.
Is there any alternative model or method that will allow me to proceed with larger image resolutions using the 3D-ResNet model?
Is it practical to use 3D-CNN with input sizes of 480*480 images for video classification tasks?
I wanted to be able to input larger image resolutions. However when I do input image size of 480*480 it takes almost 10 minutes to process a tiny 10 second clip.
It seems when I increase image size, the model inference run-time become exponentially greater.
There is crucial motion information being lost when I downscale my images to 112*112 and it is effecting the precision of the model on my test sets.
Is there any alternative model or method that will allow me to proceed with larger image resolutions using the 3D-ResNet model?
Is it practical to use 3D-CNN with input sizes of 480*480 images for video classification tasks?