mahmoodlab / MMP

Multimodal prototyping for cancer survival prediction - ICML 2024
Other
55 stars 7 forks source link

Regarding the csv files #5

Closed superli6 closed 3 months ago

superli6 commented 3 months ago

Thank you for your work. May I ask how did you obtain these csv files? The data I downloaded from TCGA seems to be different, and I couldn't find some labels' data in the csv files.

Richarizardd commented 3 months ago

Hi @superli6

The CSVs for TCGA were derived from the supplement of Liu et al, An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics, Cell 2018.. The clinical metadata for TCGA will sightly vary depending on the data source and how it was preprocessed (e.g., GDC Data Portal, Broad GDAC Firehose, PanCancer Atlas from CBioPortal), but are mostly rounding errors. The supplement from Liu et al. 2018 is especially nice as it organizes all survival endpoint data for each cancer type, as well as outlining the suitability for survival analysis for each TCGA cohort + endpoint.

superli6 commented 3 months ago

Thank you very much.

augustinefung commented 1 month ago

Could you please share more details of how the 868 cases from TCGA-BRCA were selected among the 1097 cases? I read through the paper you cited above (Liu et al., 2018) but did not find the number added up to 868. In Table 3, the disease-free survival added up to 1078 for BRCA. Is there any other criteria used to select these cases? Thanks!