uclahs-cds / package-moPepGen

Multi-Omics Peptide Generator
https://uclahs-cds.github.io/package-moPepGen/
GNU General Public License v2.0
6 stars 1 forks source link

`splitFasta` cannot group AltTranslation peptides #796

Closed lydiayliu closed 1 year ago

lydiayliu commented 1 year ago
moPepGen splitFasta \
    --variant-peptides ${b} \
    --gvf ${c} \
    --output-prefix split/${a}/${a}_split4 \
    --noncoding-peptides /hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/ref/GRCh38-EBI-GENCODE34/noncoding/min.fa \
    --alt-translation-peptides /hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/ref/GRCh38-EBI-GENCODE34/alt_translation/sect_w2f.fasta \
    --index-dir /hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/ref/GRCh38-EBI-GENCODE34/index/ \
    --group-source ALT:SECT,CodonReassign NotCirc:altSplice,Fusion,gIndel,gSNP,RNAEdit,sIndel,sSNV \
    --order-source ALT,NotCirc,circRNA,Noncoding \
    --max-source-groups 4 \
    --order-source ALT,NotCirc,circRNA,Noncoding

Even though I used ALT:SECT,CodonReassign in splitFASTA, they are still outputed separately in fastas:

I have no name!@bdc00182f166:/hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/CPCGENE/processed/noncanonical-database/call-nonCanonicalPeptide/2023-07-13_raw/pipeline-meta-call-NonCanonicalPeptide-0.0.1/split/CPCG0269$ ls *fasta
CPCG0269_split4_CodonReassign.fasta      CPCG0269_split4_NotCirc-circRNA-Noncoding.fasta  CPCG0269_split4_SECT-CodonReassign.fasta  CPCG0269_split4_circRNA.fasta
CPCG0269_split4_Noncoding.fasta          CPCG0269_split4_NotCirc-circRNA.fasta            CPCG0269_split4_SECT.fasta
CPCG0269_split4_NotCirc-Noncoding.fasta  CPCG0269_split4_NotCirc.fasta                    CPCG0269_split4_circRNA-Noncoding.fasta

The split fastas are also cleanly of the two groups

CPCG0269_split4_CodonReassign.fasta.txt
sources n_total n_0_misc        n_1_misc        n_2_misc
CodonReassign   424621  106414  168157  150050
CPCG0269_split4_SECT-CodonReassign.fasta.txt
sources n_total n_0_misc        n_1_misc        n_2_misc
SECT-CodonReassign      32      5       13      14
CPCG0269_split4_SECT.fasta.txt
sources n_total n_0_misc        n_1_misc        n_2_misc
SECT    71      17      31      23
zhuchcn commented 1 year ago

Sect and w2f are probably processed separately. Will look into this