uclahs-cds / package-moPepGen

Multi-Omics Peptide Generator
https://uclahs-cds.github.io/package-moPepGen/
GNU General Public License v2.0
5 stars 1 forks source link

ensembl fusion + variants CPCG0100 continues! #363

Closed lydiayliu closed 2 years ago

lydiayliu commented 2 years ago

This is just a gift that keeps on giving XD

a=/hot/users/yiyangliu/MoPepGen/Parser/VEP/ensembl/gsnp/CPCG0100.tsv.s.gvf
b=$(basename -- "$a"); echo ${b};
c="${b%%.*}"; echo ${c};
moPepGen callVariant \
    --input-variant /hot/users/yiyangliu/MoPepGen/Parser/Fusion/fusioncatcher-1.33/${c}.ensembl.s.gvf \
        /hot/users/yiyangliu/MoPepGen/Parser/VEP/ensembl/gsnp/${b} \
        /hot/users/yiyangliu/MoPepGen/Parser/VEP/ensembl/gindel/${b} \
        /hot/users/yiyangliu/MoPepGen/Parser/VEP/ensembl/somaticsniper/${b} \
        /hot/users/yiyangliu/MoPepGen/Parser/VEP/ensembl/pindel/${b} \
    --index-dir /hot/users/yiyangliu/MoPepGen/Index/GRCh38-EBI-ENSEMBL104/ \
    --noncanonical-transcripts \
    --verbose-level 1 \
    --threads 12 \
    --output-fasta /hot/users/yiyangliu/MoPepGen/Variant/Fusion/fusioncatcher-1.33/ssm/${c}.ensembl.fasta  > /hot/users/yiyangliu/MoPepGen/Variant/Fusion/fusioncatcher-1.33/ssm/${c}.ensembl.log
[ 2022-01-25 01:40:59 ] moPepGen callVariant started
[ 2022-01-25 01:42:09 ] Reference indices loaded.
[ 2022-01-25 01:43:27 ] Variants sorted
[ 2022-01-25 01:52:52 ] Exception raised from fusion FUSION-ENSG00000002822:20032-ENSG00000151229:57879
An error has occured during the function execution
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/ppft/__main__.py", line 111, in run
    __result = __f(*__args)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 203, in wrapper
    return call_variant_peptides_wrapper(*dispatch)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 172, in call_variant_peptides_wrapper
    _peptides = call_peptide_fusion(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 384, in call_peptide_fusion
    return pgraph.call_variant_peptides(miscleavage=miscleavage)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 655, in call_variant_peptides
    self.call_and_stage_known_orf(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 678, in call_and_stage_known_orf
    self.call_and_stage_known_orf_in_cds(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 707, in call_and_stage_known_orf_in_cds
    traversal.pool.add_miscleaved_sequences(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/VariantPeptideDict.py", line 181, in add_miscleaved_sequences
    raise ValueError('Invalid amino acid symbol found in the sequence.')
ValueError: Invalid amino acid symbol found in the sequence.
lydiayliu commented 2 years ago

I don't think this is fixed for me. Using #369 be6358b82c5cdf5e2768f2b827a52cda6d54f462

zhuchcn commented 2 years ago

Are you seeing the same error?

lydiayliu commented 2 years ago

yes

[ 2022-01-26 16:55:39 ] moPepGen callVariant started
[ 2022-01-26 16:57:05 ] Reference indices loaded.
[ 2022-01-26 16:58:30 ] Variants sorted
[ 2022-01-26 17:08:02 ] Exception raised from fusion FUSION-ENSG00000002822:20032-ENSG00000151229:57879
An error has occured during the function execution
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/ppft/__main__.py", line 111, in run
    __result = __f(*__args)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 203, in wrapper
    return call_variant_peptides_wrapper(*dispatch)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 172, in call_variant_peptides_wrapper
    _peptides = call_peptide_fusion(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 384, in call_peptide_fusion
    return pgraph.call_variant_peptides(miscleavage=miscleavage)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 679, in call_variant_peptides
    self.call_and_stage_known_orf(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 702, in call_and_stage_known_orf
    self.call_and_stage_known_orf_in_cds(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 731, in call_and_stage_known_orf_in_cds
    traversal.pool.add_miscleaved_sequences(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/VariantPeptideDict.py", line 181, in add_miscleaved_sequences
    raise ValueError('Invalid amino acid symbol found in the sequence.')
ValueError: Invalid amino acid symbol found in the sequence.
zhuchcn commented 2 years ago

I'm also seeing it but only when running the full GVF files, not the extracted GVFs. Still trying to figure out why

zhuchcn commented 2 years ago

There was a bug in downsampleReference that when downsampling multiple genes/transcripts, the order of genes/transcripts in the downsampled genome was not made consistent with annotation, so causing transcript sequences incorrect. I can not reproduce the issue in downsample GVFs.

lydiayliu commented 2 years ago

This sample is clear!