Closed varsh1090 closed 7 years ago
@varsh1090 is there another input file?
For clarification (correct me if I'm wrong): there are multiple entries of (chr
,start
,end
,ref_allele
,alt_allele
) for each patient
. I am assuming each (chr
,start
,end
,ref_allele
,alt_allele
) is mapped to a patient
and the goal is to append a patient
column to some input file (the screenshot you attached) based on its (Chromosome
,Start_position
, End_position
, Reference_...
, ) and one more column to match with alt_allele
.
Also, it looks like the value chr1
is just 1
in the screenshot. Will all the entries under Chromosome
be the same as chr
from transformed_SCLC.muTect.vcf
, without the prefix chr
?
Ya right, should be the same without the chr.
Sent from my iPhone
On Dec 16, 2016, at 7:44 PM, Victor Lin notifications@github.com wrote:
Also, it looks like the value chr1 is just 1 in the screenshot. Will all the entries under Chromosome be the same as chr from transformed_SCLC.muTect.vcf, without the prefix chr?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
The other input file is in the dropbox Bioinfoundergrads SCLC folder.
The reference and alt allele are the same, so the final columns we need in the output file are - Gene Ref_Allele Alt_Allele Category
Gene - 1st column in transformed_SCLC_onco.tsv Patient - column from transformed_SCLC.muTect.vcf (matched using chr start stop to transformed_SCLC_onco.tsv) Ref_Allele - Reference Allele column in transformed_SCLC_onco.tsv Alt_Allele - Tumor_Seq_Allele2 column in transformed_SCLC_onco.tsv Category - Variant_Classification column in transformed_SCLC_onco.tsv
The files are in - /ufrc/zhou/share/projects/bioinformatics/SCLCfiles
Running oncotator
with the flags:
--input_format=VCF
--output_format=TCGAMAF
--skip-no-alt
gives a promising output. See the file: /ufrc/zhou/share/projects/bioinformatics/SCLCfiles/sclc-files/oncotator_out_skip_no_alt.maf
The patient ID is under the column Matched_Norm_Sample_Barcode
.
If this is good, we would not need the vcf_transform.py
script.
@victor-lin Did you also use - --db-dir /ufrc/zhou/share/projects/Oncogenomics_varsha/oncotator/oncotator_v1_ds_Jan262014?
@victor-lin Could you please add the exact oncotator command here as well (including the path, input and output file names)?
Using the transformed_SCLC.muTect.vcf file, generate a dictionary to have patient ID connected to each chr start end ref_allele alt_allele
Using the dict above, generate a column for patient in file transformed_SCLC_onco.tsv.
Eg. columns in transformed_SCLC_onco.tsv