reginabarzilaygroup / Sybil

Deep Learning for Lung Cancer Risk Prediction using LDCT
MIT License
62 stars 38 forks source link

Get coordinates or visualize the most activated voxels from Sybil’s attention scores #26

Closed rphellan closed 7 months ago

rphellan commented 8 months ago

This is a great tool to estimate lung cancer risk. In the appendix to the paper describing Sybil, it is mentioned that the authors measured Sybil’s ability to localize a cancerous nodule on an LDCT. To do it, the most activated voxel from Sybil’s attention scores were selected. Would it be possible to indicate which section of the code lists which voxels were the most activated in the input image or was an external tool used?

pgmikhael commented 7 months ago

Hi,

Thanks for reaching out!

The localization results were meant for analysis rather than real-time use, so this repo doesn't include a way to directly obtain those attention scores since it's main purpose is deploying and integrating the model in clinical workflows. However, it is possible to extract the attentions with a little bit of coding.

Specifically, the output of the SybilNet model contains the attention scores. The output is a dictionary and contains the keys image_attention_1 and volume_attention_1 which correspond to (1) an attention learned over the 2D space for all slices in a CT and (2) an attention learned over the slices (the z-axis, i.e., how much to weigh each slice), respectively. Together, they can be used to measure the attention over the full 3D volume. Of note, these scores are returned as log-values in the code, so you would exponentiate them (torch.exp) to get the scores as values between 0 and 1. In the CLI Sybil model (i.e., from sybil import Sybil), the output dictionary can also be extracted from this line.

Hope this helps!

rphellan commented 7 months ago

Thank you, @pgmikhael. I tried this other idea too:

This line seems to store the result of the ResNet part of Sybil into the "activ" key of the output dictionary (I noticed it has 512 channels or features). This other line seems to store what I interpreted as the weight given to each channel/feature into the "hidden" key of output (I observed output["hidden"] is a vector with 512 values).

Do you think it would be valid to just do a weighted sum of "activ" and "hidden" to find the voxels that were the most activated?

pgmikhael commented 7 months ago

Hi,

These represent the volume itself in a projected space rather a weighting of it. The activ are learned features from the ResNet and the hidden are pooled features of the entire volume itself. So I think you would still want to use the image and volume attentions if you want to look at how the different parts of the CT are weighted.

rphellan commented 7 months ago

Thank you for the explanation.