hms-dbmi / OncoThreads

OncoThreads longitudinal cancer genomics visualization project.
http://oncothreads.gehlenborglab.org
MIT License
10 stars 1 forks source link

use cases #262

Closed wangqianwen0418 closed 3 years ago

wangqianwen0418 commented 3 years ago

related issue https://github.com/hms-dbmi/OncoThreads/issues/251: synthea covid19 dataset

wangqianwen0418 commented 3 years ago

CART cell data:

image

There are too many missing values. Even if we use some methods to handle the missing value (issue https://github.com/hms-dbmi/OncoThreads/issues/261), the analysis conclusion is not solid

wangqianwen0418 commented 3 years ago
wangqianwen0418 commented 3 years ago

The requirements of datasets:

questions to be answered:

tmazor commented 3 years ago

@wangqianwen0418 I reviewed the datasets mentioned in: Every which way? On predicting tumor evolution using cancer progression models, https://doi.org/10.1371/journal.pcbi.1007246.s009 The cancer datasets they used were not actually longitudinal data - it's actually mostly TCGA data - so I don't think it will work for us.

wangqianwen0418 commented 3 years ago

@wangqianwen0418 I reviewed the datasets mentioned in: Every which way? On predicting tumor evolution using cancer progression models, https://doi.org/10.1371/journal.pcbi.1007246.s009 The cancer datasets they used were not actually longitudinal data - it's actually mostly TCGA data - so I don't think it will work for us.

Thanks for the feedback

wangqianwen0418 commented 3 years ago

datasets that I have checked but may not suitable:

datasets that I am still working on:

tmazor commented 3 years ago

Potential datasets I found: Evolution of Cytogenetically Normal Acute Myeloid Leukemia During Therapy and Relapse: An Exome Sequencing Study of 50 Patients Greif et al, Clin Cancer Res 2018 https://clincancerres.aacrjournals.org/content/24/7/1716.long

CLL https://www.nature.com/articles/s41467-017-02329-y data in dbGaP - can we get this?

TRACERx Lung https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5812436/ genomic data in cbioportal, need to figure out clinical data situation

wangqianwen0418 commented 3 years ago

image Here is the initial analysis of the CN-AML datasets. Since the original dataset has many redundant features, I consult the paper and select gene (DNMT3A, FLT3, IDH2, IDH1) as timepoints features, gender, age, relapse days, AML type, ELN risk as patient level features. @tmazor , does this pre-processing make sense to you? Is there any other features that you think should be added?

wangqianwen0418 commented 3 years ago

@tmazor, attached please find the sample & mutation files for the CN-AML dataset. AML_mutation.txt AML_sample_freq.txt AML_timeline.txt AML_patients.txt Please let me know if you have any problem about the data. Many thanks!