Exact way of doing tile aggregation by averaging output probability

Hi Nicolas,

In your paper, you have mentioned that after running the trained model on the test data points, you do tile aggregation by averaging output probability or To assess the accuracy on the test set, the per-tile classification results were aggregated on a per-slide basis either by averaging the probabilities obtained on each tile or by counting the percentage of tiles positively classified, thus generating a per-slide classification.

I have the following outputs when I run the saved finetuned Inception V3 on binary labels for classification task which was already pre-trained on Inception V3, on a test image:

test_input = inputs.to(device)
test_label = labels.to(device)
test_output = saved_model_ft(test_input)
probabilities = torch.nn.functional.softmax(test_output[0], dim=0)
print('probabilities: ', probabilities)
probability = torch.max(torch.nn.functional.softmax(test_output[0], dim=0))
print('probability: ', probability)
_, test_pred = torch.max(test_output, 1)
print('test output: ', test_output)
print('test pred: ', test_pred)

and result is:

probabilities:  tensor([0.8992, 0.1008], device='cuda:2')
probability:  tensor(0.8992, device='cuda:2')
test output:  tensor([[ 1.0471, -1.1416]], device='cuda:2')
test pred:  tensor([0], device='cuda:2')

By averaging the output probabilities, do you mean that you get the max probability 0.89 here and multiply it by the label, 0, here as well and do same for each tile in a given WSI? Or what do you exactly mean? Would you please be able to provide a formula for this or walk me through it?

I am ending up having all of my test preds as 0 so I am unsure how the tile aggregation by averaging output probability in done in your case. Even if I do weighted averaging based on the probabilities, still, I end up with all 0s.

so, please assume we have the following results for two tiles in test set that both belong to the same WSI and that WSI presumably has two tiles.

if I have this for one test data point

probabilities:  tensor([0.8992, 0.1008], device='cuda:2')
probability:  tensor(0.8992, device='cuda:2')
test output:  tensor([[ 1.0471, -1.1416]], device='cuda:2')
test pred:  tensor([0], device='cuda:2')

and this for another test datapoint:

probabilities:  tensor([0.7603, 0.2397], device='cuda:2')
probability:  tensor(0.7603, device='cuda:2')
test output:  tensor([[ 0.4782, -0.6760]], device='cuda:2')
test pred:  tensor([0], device='cuda:2')

For example, after 100 epochs of training for EGFR LUAD images, this is what I get as a prediction on WSI level tiles. I have created a dictionary of WSIs where each WSI itself is another dictionary for prediction for all tiles, and probabilities for all tiles. As you see, barely any tile values get 1. And only for one WSI, 2 of the tiles, become 1 (EGFR).

So for the EGFR finetuning of Inception V3, how many epochs should I use? How do you do the tile aggregation by averaging the output probability here? using (0.8992 0 + 0.7603 0) / 2 ?

If this is the case, my concern is since majority of data is class 0, this would result with 0 as the label as well.

Thanks for any lead.

ncoudray / DeepPATH

Exact way of doing tile aggregation by averaging output probability #104