Get patient info in tsv file

zhoulab / sclc-scripts

scripts for "Significantly mutated genes and regulatory pathways in SCLC—a meta-analysis"

https://doi.org/10.1016/j.cancergen.2017.05.003

2 stars 0 forks source link

Get patient info in tsv file #2

Closed varsh1090 closed 7 years ago

varsh1090 commented 7 years ago

Using the transformed_SCLC.muTect.vcf file, generate a dictionary to have patient ID connected to each chr start end ref_allele alt_allele
Using the dict above, generate a column for patient in file transformed_SCLC_onco.tsv.

Eg. columns in transformed_SCLC_onco.tsv

victorlin commented 7 years ago

@varsh1090 is there another input file?

For clarification (correct me if I'm wrong): there are multiple entries of (chr,start,end,ref_allele,alt_allele) for each patient. I am assuming each (chr,start,end,ref_allele,alt_allele) is mapped to a patient and the goal is to append a patient column to some input file (the screenshot you attached) based on its (Chromosome,Start_position, End_position, Reference_..., ) and one more column to match with alt_allele.

victorlin commented 7 years ago

Also, it looks like the value chr1 is just 1 in the screenshot. Will all the entries under Chromosome be the same as chr from transformed_SCLC.muTect.vcf, without the prefix chr?

varsh1090 commented 7 years ago

Ya right, should be the same without the chr.

Sent from my iPhone

On Dec 16, 2016, at 7:44 PM, Victor Lin notifications@github.com wrote:

Also, it looks like the value chr1 is just 1 in the screenshot. Will all the entries under Chromosome be the same as chr from transformed_SCLC.muTect.vcf, without the prefix chr?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

varsh1090 commented 7 years ago

The other input file is in the dropbox Bioinfoundergrads SCLC folder.

varsh1090 commented 7 years ago

The reference and alt allele are the same, so the final columns we need in the output file are - Gene Ref_Allele Alt_Allele Category

Gene - 1st column in transformed_SCLC_onco.tsv Patient - column from transformed_SCLC.muTect.vcf (matched using chr start stop to transformed_SCLC_onco.tsv) Ref_Allele - Reference Allele column in transformed_SCLC_onco.tsv Alt_Allele - Tumor_Seq_Allele2 column in transformed_SCLC_onco.tsv Category - Variant_Classification column in transformed_SCLC_onco.tsv

varsh1090 commented 7 years ago

The files are in - /ufrc/zhou/share/projects/bioinformatics/SCLCfiles

victorlin commented 7 years ago

Running oncotator with the flags:

--input_format=VCF
--output_format=TCGAMAF
--skip-no-alt

gives a promising output. See the file: /ufrc/zhou/share/projects/bioinformatics/SCLCfiles/sclc-files/oncotator_out_skip_no_alt.maf

The patient ID is under the column Matched_Norm_Sample_Barcode. If this is good, we would not need the vcf_transform.py script.

varsh1090 commented 7 years ago

@victor-lin Did you also use - --db-dir /ufrc/zhou/share/projects/Oncogenomics_varsha/oncotator/oncotator_v1_ds_Jan262014?

varsh1090 commented 7 years ago

@victor-lin Could you please add the exact oncotator command here as well (including the path, input and output file names)?