histocartography / zoommil

ZoomMIL is a multiple instance learning (MIL) method that learns to perform multi-level zooming for efficient Whole-Slide Image (WSI) classification.
MIT License
63 stars 6 forks source link

Attention maps #6

Open Mairafatoretto opened 1 year ago

Mairafatoretto commented 1 year ago

Did you make the programs available to make the attention maps? I would like to better understand how it was generated

kevthan commented 1 year ago

No, we just released the code for the main functionalities: pre-processing, training, and testing

Mairafatoretto commented 1 year ago

Hello Kevin. Could you explain to me, how did you make figure 4? Because in your model you have two GA., then for me its not clear how did you make the attention score for each patch.

kevthan commented 1 year ago

Hi, basically you can modify the forward pass of the model so that it also returns the attention scores A_1_aux as well as the indices of the selected patches. Then you can infer the corresponding patch coordinates so that they can be highlighted in the WSI.

It's not super straightforward. I will try to provide a script for this in the coming days, but it may take some time.

Mairafatoretto commented 1 year ago

Hmm, I understood the idea Kevin, but what would be the variable with the indexes of the patches? And how will I know the coordinates, if they are not given in the model, only in the preprocess?

kevthan commented 1 year ago

For the indices you can use, e.g., select_1. A preprocessed .h5 file should also contain the fields 1.25x_coords, 2.5x_coords etc. from which you can retrieve the patch coordinates.

Mairafatoretto commented 1 year ago

But you only plot the attention for the top k ? Then the select_1 has the same order than preprocessed .h5?

kevthan commented 1 year ago

You're right, what I described is for plotting only the top k. If you want to plot all attention values, you don't need the selected coordinates. You can use the attention A_1_aux and the coordinates from the .h5 file. The order should match.

When you use the actual WSI to overlay it with the attention scores, it's important to load it at the right magnification and pad it in the same way as during preprocessing (similarly as here)

Mairafatoretto commented 1 year ago

I understood this part Kevin, However the output of the function pad_image_with_factor function is an np array , I'm having trouble getting back to the original image format to plot the heatmap, in my case svs.

kevthan commented 1 year ago

If it's just for visualization, you don't have to go back to .svs. You can use Image.fromarray() from the PIL library to convert the numpy array to an image that you can save.

Mairafatoretto commented 1 year ago

Hi Kevin, I'm sorry but it's still not clear to me what you are saving in the coordinates. Are they columns and rows? Are they x and y? if they are columns and rows how do I know the total number of columns and rows each image has?

kevthan commented 1 year ago

Hi, the coordinates represent the column and row indices of the patches. If you multiply the indices with the patch size (256), you get the absolute coordinates y and x. What do you need the total number of columns/rows for?

Mairafatoretto commented 1 year ago

Hi Kevin, strange because for the second dimension I have 4 values ​​for each patch, the last two of which are binary only (0,1).

Another strange point is that this multiplication of 256 works only for 20x dimension images and not for 40x dimension. I'm trying to use deepZoom's get_tile_coodenates to get back the initial coordinates. But apparently it doesn't get the exact coordinate either.

Mairafatoretto commented 1 year ago

Hi Kevin, strange because for the second dimension I have 4 values ​​for each patch, the last two of which are binary only (0,1).

Another strange point is that this multiplication of 256 works only for 20x dimension images and not for 40x dimension. I'm trying to use deepZoom's get_tile_coodenates to get back the initial coordinates. But apparently it doesn't get the exact coordinate either.

Mairafatoretto commented 1 year ago

Kevin, when you extract the highest magnitude attention scores, the program is doing a torch.einsum. This makes my score vector very small, different from the number of patches and the same size for all images. What is the purpose of making this transformation to x2?

kevthan commented 1 year ago

This is the actual patch selection process (formulated as matrix multiplication, for which we use torch.einsum())