uclahs-cds / package-moPepGen

Multi-Omics Peptide Generator
https://uclahs-cds.github.io/package-moPepGen/
GNU General Public License v2.0
6 stars 1 forks source link

two suggestions for `summarizeFasta` #424

Closed lydiayliu closed 2 years ago

lydiayliu commented 2 years ago
  1. right now the pure "Noncoding" count is 0, maybe add the option to input a noncoding.fasta just like splitFasta? I think it's nice to have all the numbers in the same place

  2. The "impossible" data type combinations can be eliminated? Like there would NEVER be a circ2-starfusion peptide and those take up quite a few lines and could be misleading

Result on real data

I have no name!@1e085007e2d9:/hot/project/algorithm/moPepGen/CPCGENE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/pipeline-meta-call-NonCan
onicalPeptide-0.0.1$ cat CPCG0196/call-NonCanonicalPeptide-1.0.0/CPCG0196/moPepGen-0.2.0-fdb97d0/output/CPCG0196-variantPeptides_summary.txt 
sources n_peptides                                                                                                                                                          
circ2   370431                                                                                                                                                              
starfusion      0                                                                                                                                                           
reditools       11418                                                                                                                                                       
rMATs   13225                                                                                                                                                               
gindel  3830                                                                                                                                                                
gsnp    26723                                                                                                                                                               
pindel  11                                                                                                                                                                  
somaticsniper   18                                                                                                                                                          
Noncoding       0                                                                                                                                                           
circ2-starfusion        0                                                                                                                                                   
circ2-reditools 881                                                                                                                                                         
circ2-rMATs     0                                                                                                                                                           
circ2-gindel    60                                                                                                                                                          
circ2-gsnp      5321                                                                                                                                                        
circ2-pindel    24                                                                                                                                                          
circ2-somaticsniper     0                                                                                                                                                   
circ2-Noncoding 24292                                                                                                                                                       
starfusion-reditools    0                                                                                                                                                   
starfusion-rMATs        0                                                                                                                                                   
starfusion-gindel       0                                                                                                                                                   
starfusion-gsnp 0                                                                                                                                                           
starfusion-pindel       0                                                                                                                                                   
starfusion-somaticsniper        0      
...
circ2-rMATs-gsnp-somaticsniper-Noncoding        0                                                                                                                           
circ2-rMATs-pindel-somaticsniper-Noncoding      0                                                                                                                           
circ2-gindel-gsnp-pindel-somaticsniper  0                                                                                                                                   
circ2-gindel-gsnp-pindel-Noncoding      0
circ2-gindel-gsnp-somaticsniper-Noncoding       0
circ2-gindel-pindel-somaticsniper-Noncoding     0
circ2-gsnp-pindel-somaticsniper-Noncoding       0
starfusion-reditools-rMATs-gindel-gsnp  0
starfusion-reditools-rMATs-gindel-pindel        0
starfusion-reditools-rMATs-gindel-somaticsniper 0
starfusion-reditools-rMATs-gindel-Noncoding     0
starfusion-reditools-rMATs-gsnp-pindel  0
starfusion-reditools-rMATs-gsnp-somaticsniper   0
starfusion-reditools-rMATs-gsnp-Noncoding       0
starfusion-reditools-rMATs-pindel-somaticsniper 0
starfusion-reditools-rMATs-pindel-Noncoding     0
starfusion-reditools-rMATs-somaticsniper-Noncoding      0
starfusion-reditools-gindel-gsnp-pindel 0
starfusion-reditools-gindel-gsnp-somaticsniper  0
starfusion-reditools-gindel-gsnp-Noncoding      0

btw also seeing starfusion = 0 on all 4 of the cpcg test samples... I'm looking into it

zhuchcn commented 2 years ago

right now the pure "Noncoding" count is 0, maybe add the option to input a noncoding.fasta just like splitFasta? I think it's nice to have all the numbers in the same place

Thought about this, but isn't that just a simple grap -c > can do?

The "impossible" data type combinations can be eliminated? Like there would NEVER be a circ2-starfusion peptide and those take up quite a few lines and could be misleading

Sure, but also this can be done by a simple command with awk to remove everything of 0?

lydiayliu commented 2 years ago
  1. possible overlap with variant peptides?
  2. but I care about possible combinations that are 0 (and want all the possible combinations listed), but don't care about the impossible combinations at all
zhuchcn commented 2 years ago

Those are reasonable arguments so I'll add them in

lydiayliu commented 2 years ago

I know I'm extremely annoying, how hard is it to output these summary numbers by # of miscleavages? XD

zhuchcn commented 2 years ago

What do you need it for? We could search for number of cleavage sites for each peptide, because we currently don't have that information in the FASTA header.

lydiayliu commented 2 years ago

For plotting XD as always

I feel like this would be good info to show in a database overview barplot

zhuchcn commented 2 years ago

Yeah OK, that makes sense!