knowledgesystems / curation-scrum

Used for issue tracking of data curation efforts.
0 stars 0 forks source link

Merging FMI + IMPACT gbm/glioma cases (07/25) #125

Open chenh4 opened 8 years ago

chenh4 commented 8 years ago

cancer_study_identifier: gbm_fmi_mskcc_2016 (change from mixed_gbm_mellinghoff_impact to gbm_fmi_mskcc_2016)

Total sample number: 624 (as of 3/31) FM cases: 159 IMPACT: 465

Total sample number: 639 (as of 4/8) FM cases: 159 IMPACT: 480

Total sample number: 650 (as of 4/22) FM cases: 159 IMPACT: 491

Total sample number: 667 (as of 5/06) FM cases: 159 IMPACT: 508

Total sample number: 682 (as of 5/20) FM cases: 159 IMPACT: 523

Total sample #: 685 (as of 6/3) FMI: 159 IMPACT: 526

Total sample #: 707 (as of 6/17) FMI: 159 IMPACT: 548

_6/22 FMI data got update_ FMI: 187

Total sample #: 743 (as of 6/24) FMI: 187 IMPACT: 556

Total sample #: 763 (as of 7/8) FMI: 187 IMPACT: 576

Total sample #: 780 (as of 7/25) FMI: 187 IMPACT: 593

In next update, add these 6 cases backs TRF051908 TRF054735 TRF021885 TRF069507 TRF059647 TRF063580

Survival data is complete.

Update bi-weekly.

chenh4 commented 8 years ago

Just got imported into Triage portal. Oh yeah! http://dashi-dev.cbio.mskcc.org:28080/triage/study.do?cancer_study_id=gbm_fmi_mskcc_2016#summary

chenh4 commented 8 years ago

https://cbioportal.mskcc.org/study.do?cancer_study_id=gbm_fmi_mskcc_2016#summary

zheins commented 8 years ago

@chenh4, please fully document your workflow for doing the merge. That would help me better understand the things that need to be done to automate this process.

chenh4 commented 8 years ago

@zheins Thanks Zack, yes I will give you the document soon. I am testing to import the raw files (with minimal adjustment) generated from the merge script.

chenh4 commented 8 years ago

@zheins The merged script is awesome! I think we are ready to make the merged study update automatically!! Here are the steps that I did for merge.

  1. Make glioma case list: • msk-impact glioma case list : SAMPLE_ID at msk-impact/msk-impact/data_clinical.txt (if CANCER_TYPE == Glioma) • FMI case list : all SAMPLE_ID at /cbio-portal-data/foundation/gbm/mskcc/foundation/data_clinical.txt
  2. Run the merge script • --study1= /cbio-portal-data/foundation/gbm/mskcc/foundation • --study2 = /msk-impact/msk-impact/ • --output-directory= /impact/MERGED/gbm/fmi_mskcc/2016 • --study-id=gbm_fmi_mskcc_2016 • --cancer-type=gbm • --subset= the case list from (1)
  3. Add meta files: • meta_clinical.txt • meta_CNA.txt • meta_fusions.txt • meta_mutations_extended.txt • meta_timeline.txt • gbm_fmi_mskcc_2016_meta_cna_hg19_seg.txt • Rename mskimpact_data_cna_hg19.seg to gbm_fmi_mskcc_2016_data_cna_hg19.seg

From Zack:

you set the study location, the repo, as the output directory of the merge after it runs, the meta files are already there we just don’t touch them you should only have to put the meta files there one time, the first time then never touch them and the merge script will just update the data files

chenh4 commented 8 years ago

@zheins Hi Zack, the timeline data doesn't show up for all cases. The data format is the same as before, and I didn't touch the timeline data generated from the merge script. Could you please take a look? Thanks!

chenh4 commented 8 years ago

Looks good on Triage portal. Will import into MSK portal tomorrow.

http://dashi-dev.cbio.mskcc.org:28080/triage/study.do?cancer_study_id=gbm_fmi_mskcc_2016#summary

chenh4 commented 8 years ago

Will be on MSK portal tomorrow.

http://dashi-dev.cbio.mskcc.org:28080/triage/study.do?cancer_study_id=gbm_fmi_mskcc_2016#summary

chenh4 commented 8 years ago

Need to add timeline data to merged study.

chenh4 commented 8 years ago

https://cbioportal.mskcc.org/study?id=gbm_fmi_mskcc_2016#summary

chenh4 commented 8 years ago
  1. Please see the attached non-glioma FMI file. Based on cancer type detailed, these cases are non-gliomas. Should we include them in the gbm merged study?
  2. Do you have a case list for the FMI cases that should (or should not) be included in the merged study? Or please see the attached data_clinical file. We have 187 FMI cases. Should we include them all or which ones should be removed? 3.Do you know which special types of glioma among the IMPACT samples that shouldn’t be included?
  3. Will we receive the last batch of FM samples? Philip mentioned there are still some reports that were requested but never received.

sent email to Andy

chenh4 commented 8 years ago

image

these three samples are not found in msk-impact clinical data.