szc19990412 / TransMIL

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification
363 stars 74 forks source link

computational cost of training per Epoch #23

Open deep-matter opened 1 year ago

deep-matter commented 1 year ago

i would like to ask how much the training step takes per Epoch, i used your built model and i modified the PPGE model by adding FFT to reduce the dimension Convolution operation, only issue i noticed was that the Trainer took a lot of time to finish single Epoch, that's is related to size the shape of the image (2154,1024) or i missed something

hans0809 commented 1 year ago

I trained on my own dataset(200train +30val),each slide were cut into 500~1500 tiles, and then embeded into 2048 dim vectors. It cost about 1min per epoch. I wonder how many slide is in your dataset(train and val) and how long it takes per epoch. By the way, my result were pretty poor and my training procedure were not stable, don't know where maybe wrong.

szc19990412 commented 1 year ago

i would like to ask how much the training step takes per Epoch, i used your built model and i modified the PPGE model by adding FFT to reduce the dimension Convolution operation, only issue i noticed was that the Trainer took a lot of time to finish single Epoch, that's is related to size the shape of the image (2154,1024) or i missed something

Because we set the batch size to one, the training step in one Epoch is equal to the number of your training slide. Meanwhile, because we preprocess all the WSIs into features, we test the training is very quick in RTX3090, roughly 0.5 min per epoch if we have 400 slides.

szc19990412 commented 1 year ago

I trained on my own dataset(200train +30val),each slide were cut into 500~1500 tiles, and then embeded into 2048 dim vectors. It cost about 1min per epoch. I wonder how many slide is in your dataset(train and val) and how long it takes per epoch. By the way, my result were pretty poor and my training procedure were not stable, don't know where maybe wrong.

1、Each slide has 500~1500 tiles, have you process the WSI in the 20x magnification or higher? 2、You can also test the performance of other MIL methods in your dataset, such as ABMIL or CLAM, if the task is challenging or the dataset is limited.

hans0809 commented 1 year ago

For each slide, I did 4 times downsample and then cut into 224x224 small patches. I tried DTFD-MIL(《DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification》), and got a better performance.

I print the softmax(pred_logits) for both TransMIL and DTFD-MIL, the latter is more discriminative: GT label zoy2KH.png TransMIL zoykUP.png DTFD-MIL zoyegg.png

Maybe something wrong with my implementation...

szc19990412 commented 1 year ago

This result seems strange, as it appears to show that the model is overfitting. Because the DTFD model is built on a smaller model ABMIL, you might experiment with Transformer aggregation in lower dimensional aggregation, such as lowering from 2048 to 128 or 256. Furthermore, have you used our Pytorch lightning framework and the Ranger optimizer.

JD910 commented 1 year ago

Nice work.

Any recommendations to preprocess the WSIs into features (since the quality of features may influence the classification directly)?