mskcc / pluto-cwl

CWL workflows for helix filter scripts
1 stars 6 forks source link

need to update samples fillout workflow output #84

Open stevekm opened 2 years ago

stevekm commented 2 years ago

@timosong @svural

timosong commented 2 years ago

need to generate data mutations uncalled output (https://github.com/mskcc/pluto-cwl/issues/63) Current output should be 4 maf files: 1) "unfiltered" maf. Has no filters that have distinction between clinical and research samples. This should end up in analysis folder 2) "filtered" maf. Germline filters, has distinction between clinical and research. This should end up in analysis folder 3) data_mutations_extended.txt. This file has fewer columns than file 2 filtered. This should end up in portal folder 4) data_mutations_uncalled.txt. This file has fewer columns than file 2 filtered. This should end up in portal folder

Number of rows of file#3 and file#4 equal number of rows in file 2.

need to update all portal files to include data for all new DMP samples included in output.

The following files need to be updated. 1) data_mutations_extended.txt. Need to find out what columns can be filled from DMP data and what can be left blank. The only necessary columns are GENE_PANEL, PATIENT_ID, SAMPLE_ID. 2) case_lists files need to have new DMP ids appended. (tab delimited at end of case_list_ids) case_lists/cases_all.txt case_lists/cases_cnaseq.txt case_lists/cases_cna.txt case_lists/cases_sequenced.txt

new files need to be created: 1) meta_mutations_uncalled.txt cancer_studyidentifier: pilot_msk_melpcm_ data_filename: data_mutations_extended.txt datatype: MAF genetic_alteration_type: MUTATION_EXTENDED profile_description: Mutation data profile_name: Mutations show_profile_in_analysis_tab: true stable_id: mutations namespaces: ASCN

Caveats. Nice to have is DMP Copy number merged in if fillout was ever performed. This also means that there may be DMP ids present in copy number, but not in fillout and vice versa. So we need to create a union of all dmp ids and then add that to the above files (case_lists, data_clinical_sample)

stevekm commented 2 years ago

@timosong I think the "uncalled" mutations files should be a separate issue, are they required for fillout data import?

stevekm commented 2 years ago
stevekm commented 2 years ago

case list fixes are implemented