uclahs-cds / package-moPepGen

Multi-Omics Peptide Generator
https://uclahs-cds.github.io/package-moPepGen/
GNU General Public License v2.0
6 stars 1 forks source link

non-empty STARFusion GVF but no FUSION peptides #409

Closed lydiayliu closed 2 years ago

lydiayliu commented 2 years ago

Is this possible?

Everything is in here:

/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/pipeline-meta-call-NonCanonicalPeptide-0.0.1/ACH-000088/call-NonCanonicalPeptide-1.0.0/ACH-000088/moPepGen-0.2.0-cfa2187/output

There's definitely stuff in the file? Even though it's all from the same fusion

##fileformat=VCFv4.2
##mopepgen_version=0.2.0
##parser=parseSTARFusion
##reference_index=/scratch/c3/67123c9a7372ad4759d92fdc4378cf/work/e9/443831b28e6269863efeb5d03d243e/GRCh38-EBI-GENCODE34
##genome_fasta=
##annotation_gtf=
##source=fusion
##CHROM=<Description='Gene ID'>
##INFO=<ID=TRANSCRIPT_ID,Number=1,Type=String,Description="Transcript ID">
##INFO=<ID=GENE_SYMBOL,Number=1,Type=String,Description="Gene Symbol">
##INFO=<ID=GENOMIC_POSITION,Number=1,Type=String,Description="Genomic Position">
##INFO=<ID=ACCEPTER_GENE_ID,Number=1,Type=String,Description="3' Accepter Transcript's Gene ID">
##INFO=<ID=ACCEPTER_TRANSCRIPT_ID,Number=1,Type=String,Description="3' Accepter Transcript's Transcript ID">
##INFO=<ID=ACCEPTER_POSITION,Number=1,Type=Integer,Description="Position of the break point of the 3' accepter transcript">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
ENSG00000213626.13      26739   FUSION-ENST00000395323.9:26738-ENST00000314797.10:25257 T       <FUSION>        .       .       TRANSCRIPT_ID=ENST00000395323.9;GENE_SYMBOL=LBH;GENOMIC_POSITION=chr2:30258271:30258271;ACCEPTER_GENE_ID=ENSG00000181789.14;ACCEPTER_TRANSCRIPT_ID=ENST00000314797.10;ACCEPTER_SYMBOL=COPG1;ACCEPTER_POSITION=25258;ACCEPTER_GENOMIC_POSITION=chr3:129274863:129274863
ENSG00000213626.13      26739   FUSION-ENST00000395323.9:26738-ENST00000515725.5:25257  T       <FUSION>        .       .       TRANSCRIPT_ID=ENST00000395323.9;GENE_SYMBOL=LBH;GENOMIC_POSITION=chr2:30258271:30258271;ACCEPTER_GENE_ID=ENSG00000181789.14;ACCEPTER_TRANSCRIPT_ID=ENST00000515725.5;ACCEPTER_SYMBOL=COPG1;ACCEPTER_POSITION=25258;ACCEPTER_GENOMIC_POSITION=chr3:129274863:129274863
ENSG00000213626.13      26739   FUSION-ENST00000395323.9:26738-ENST00000509889.5:25257  T       <FUSION>        .       .       TRANSCRIPT_ID=ENST00000395323.9;GENE_SYMBOL=LBH;GENOMIC_POSITION=chr2:30258271:30258271;ACCEPTER_GENE_ID=ENSG00000181789.14;ACCEPTER_TRANSCRIPT_ID=ENST00000509889.5;ACCEPTER_SYMBOL=COPG1;ACCEPTER_POSITION=25258;ACCEPTER_GENOMIC_POSITION=chr3:129274863:129274863
zhuchcn commented 2 years ago

I can take a quick look at this. It's likely that those peptides already exist in the canonical peptide pool.

This makes me think that maybe we should write some stats/summary to the stdout for debugging purpose, like number of variant peptide called, number of peptides existing in canonical peptide pool

lydiayliu commented 2 years ago

yeah i think that's a great idea! it would be reassuring to get some statistics thought might be a little bit more work counting the # of peptides with each "label" (like SNV, SNV-Noncoding, etc)

zhuchcn commented 2 years ago

For this particular case, for all the fusion events, the breakpoints are all after the canonical stop codon, so that's why no peptides were saved.

lydiayliu commented 2 years ago

Ok I guess no action points on this particular fusion! thanks for checking!

zhuchcn commented 2 years ago

Let's close this and I'll open another issue for the summary stuff