mahmoodlab / SurvPath

Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
82 stars 4 forks source link

how is clinical information collected? #9

Closed XiaoXueShengwangrui closed 2 weeks ago

XiaoXueShengwangrui commented 3 weeks ago

Your work is great! I am a novice in pathological image survival analysis, so I would like to know how the clinical information are collected correctly?

The information in cbioportal is very complicated and messy. I am not sure whether I only need to extract the survival time.

image

After obtaining the raw CSV file from cbioportal. Does it need further processing to get the dataset_csv datasets_csv/metadata/tcga_blca.csv?

In addition, in tcga_blca.csv file, what is the difference between a survival_months | survival_months_dss | survival_months_pfi?

Looking forward to your reply! @guillaumejaume @ajv012

ajv012 commented 2 weeks ago

Thanks for your interest in our project!

To answer your first question, we accessed the survival data from UCSC Xena. We used cbioportal to access patient clinical data such as stage, grade, sex, etc. We did not use the survival data from cbioportal.

For your second question, there are different kinds of survival types. DSS stands for "disease specific survival" and PFI stands for "progression free survival" You can read more about them as well as their benefits and pitfalls here. In our study we use disease specific survival, which has also been done at here and here.

adiv5 commented 1 week ago

On the UCSC Xenabrowser, We already have the Phenotypic data along with the survival information right. I wanted to know why do we go to CBioportal for the clinical data?

Also, Can you provide clarity on which dataset were used from UCSC XenaBrowser for RNAseq data

Was it "GDC TCGA *" or "TCGA "