mahmoodlab / MMP

Multimodal prototyping for cancer survival prediction - ICML 2024
Other
55 stars 7 forks source link

some questions #4

Closed aletolia closed 1 month ago

aletolia commented 3 months ago

Thank you for your excellent work! I have some questions regarding this model that I would like to discuss with you:

  1. Regarding the all_dump.h5 file: Based on the code, this file contains the time, censorship, and corresponding attention scores for both the training and test sets. After reading it, I found that it is actually a nested dictionary type. When I extracted the data, I discovered that the patient IDs in the training set did not match the corresponding time and censorship values, whereas the test set matched correctly. (That is, the patient IDs extracted from this file were assigned incorrect survival times and censorship values for some reason. In fact, I encountered the same issue when using the previous version of PORPOISE. To obtain correctly paired risk scores, I had to put all the data into the test set to get correctly matched data.)

  2. This question is related to the previous one. I want to know how you view the evaluation of a prognostic model's quality using the c-index and Kaplan-Meier curves. In my opinion, the c-index reflects the model's ability to predict survival outcomes, while the K-M curve shows the trend of patient deaths/censorship over time given the same preset survival probability. This is indeed the case; I replicated MMP on the TCGA-UVM dataset and found that patients with higher risk scores often had death outcomes, while those with lower scores were either alive or censored. However, due to differences in the start and length of follow-up times, some patients had short follow-up periods and were ultimately censored, causing the model to assign them lower risk scores. Conversely, some patients had long follow-up periods and eventually died, leading the model to assign them relatively high risk scores, resulting in very high p-values for the K-M curve (p-values for each fold exceeding 1).

  3. Regarding mmp_visualization.ipynb: I successfully visualized the heatmap by modifying the code to out, qqs = panther_encoder.representation(feats.unsqueeze(dim=0)).values(), but the results differ significantly from those presented in the paper. I found that the overlayed patches on the heatmap are semi-transparent, making it difficult to distinguish whether a region is genuinely overlayed or just tissue if the tissue color is similar to the patches. Additionally, if the default parameters are used for visualization, the patch size changes to 448*448 instead of the original 224*224.

  4. Also in mmp_visualization.ipynb: The line path2omic = cross_attn_path2omic.loc[omic].sort_values(ascending=False) throws an error if the by parameter is missing, but I need to sort each column's data separately. How did you solve this issue?

Richarizardd commented 1 month ago

Hi @aletolia - Q3 and Q4 should be solved in issue #3. On Q3, I would just comment on that:

For Q1, I did not see this as being an issue, and will let @andrewsong90 comment further. Which slides do you report the censorship / survival times not being correct? See my response in issue #5.

For Q2, I would say your interpretation is correct. I think about C-Index from its literal definition (AUC-like metric ranking the concordance of survival times with the predicted risks) but with the caveat that its positively biased towards highly censored datasets (depending on how you deal with ties and censored data). Log rank test should not be inherently biased towards higher censorship. I have not done survival modeling in TCGA-UVM, but to leave a more satisfying answer - I refer to Liu et al, An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics, Cell 2018. which is an excellent resource in understanding nuances in processing TCGA data. Crafting high-quality study designs involving TCGA to extract better insights are welcomed, and interested to see the future work coming :^)

aletolia commented 1 month ago

Hi @aletolia - Q3 and Q4 should be solved in issue #3. On Q3, I would just comment on that:

  • If the seed / random state / sklearn versions in obtaining the global clusters are not the same, then I would expect the cluster maps to slightly vary.
  • By results "differing significantly", do you mean the color palette + color transparency are not the same? This should now be resolved in issue visualization #3. I will note that one limitation of picking a high # of clusters is that it becomes very difficult to discern distinct clusters. The transparency of the color overlay + mixture plot may also make it difficult to match the mixture proportion values to the cluster maps visually.
  • You may have to change the patch size for many TCGA slides. Most (but not all) TCGA slides start at 40X for level 0 (and skip 20X), so to extract patch features at 20X with 256x256 patches, you have to patch at twice the desired patch size at 40X then resize down during feature extraction. This then affects heatmap visualizations, in which when you are assigning predictions to patches in the WSI, you have to assign it to non-overlapping 512x512 patches.

For Q1, I did not see this as being an issue, and will let @andrewsong90 comment further. Which slides do you report the censorship / survival times not being correct? See my response in issue #5.

For Q2, I would say your interpretation is correct. I think about C-Index from its literal definition (AUC-like metric ranking the concordance of survival times with the predicted risks) but with the caveat that its positively biased towards highly censored datasets (depending on how you deal with ties and censored data). Log rank test should not be inherently biased towards higher censorship. I have not done survival modeling in TCGA-UVM, but to leave a more satisfying answer - I refer to Liu et al, An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics, Cell 2018. which is an excellent resource in understanding nuances in processing TCGA data. Crafting high-quality study designs involving TCGA to extract better insights are welcomed, and interested to see the future work coming :^)

Thank you for your patient explanation, it has been a great help to me! :)