szc19990412 / TransMIL

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification
325 stars 72 forks source link

Model input clarification #15

Closed EmanuelOverflow closed 2 years ago

EmanuelOverflow commented 2 years ago

Hi, I'm using TransMIL on a custom TCGA dataset. I followed the CLAM guide to extract features with some modification to extract the right number of features as explained in the TransMIL paper.

Transformer architecture requires a 3D tensor of shape BxNxC. I suppose that in case of MIL B is the bag size, N the sptial length of the features which I got reshaping the HxW (16x16) output dimension of the resnet50 and C is the 1024 features vector associated to the input image. By using this setting I got to work just with a small batch of bags images around 256. How have you managed the high quantity of instances? I tried your settings with apex and mixed precision, but I got errors on input datatype and again cuda out of memory. I'm using 4 nVidia V100 with 32gb.

My doubts are on the length N, is it possible that it is the number of instances and the batch size is 1, as reported in configuration file you provided? It should mean that the extracted features are averaged after the extraction of resnet like in CLAM.

EmanuelOverflow commented 2 years ago

Ok, I misunderstood the way the dataset has to be processed; the N corresponds to number of instances, so the resnet50 output to extract is after the average pooling aggregation at layer3