Closed yuvfried closed 2 months ago
Hello - the output shape of (batch_size, 768, 5, 8, 8)
is expected from the last layer of the swinViT model. Our model is based on ProjectMONAI's SwinUNETR model, which has that shape (see attached Figure 1 from the Project MONAI github).
Hi,
Regarding the pretrained backbone model described in your paper, I’ve noticed that while the repository contains code for both SSL pretraining and fine-tuning phases, there is no code available for loading the pretrained backbone weights out of the box.
I attempted to use the provided code for resuming a stopped training session to load the pretrained weights. However, when I perform inference on a tensor of 4 stacked BraTS sequences, the output tensor shape is
(batch_size, 768, 5, 8, 8)
instead of the expected(batch_size, 768)
. I used the last layer of theswinViT
part of the model for this output.Could you please provide guidance on the correct procedure for loading the pretrained backbone and extracting embeddings from BraTS sequences (before the decoding part for segmentation)?