uclahs-cds / package-moPepGen

Multi-Omics Peptide Generator
https://uclahs-cds.github.io/package-moPepGen/
GNU General Public License v2.0
5 stars 1 forks source link

Sequence location got lost after junction site on a hybrid node #621

Closed lydiayliu closed 1 year ago

lydiayliu commented 1 year ago
Command executed:

  set -euo pipefail

  NXF_WORK=$(pwd)/work \
  nextflow run /hot/users/yiyangliu/project-MissingPeptides-Method/pipelines/pipeline-meta-call-NonCanonicalPeptide/modules/call_NonCanonicalPeptide/../../external/pipeline-call-NonCanonicalPepti
de/main.nf \
      --sample_name CPCG0184 \
      --input_csv input_3.csv \
      --filterFasta null \
      --output_dir $(pwd)/CPCG0184 \
      --entrypoint gvf \
      --call_variant_ncpus 22 \
      --call_variant_memory_GB 45 GB \
      --variant_fasta NO_VARIANT_FASTA \
      -params-file config.json \
      -c /hot/users/yiyangliu/project-MissingPeptides-Method/pipelines/pipeline-meta-call-NonCanonicalPeptide/modules/call_NonCanonicalPeptide/template.config

Command exit status:
  1

Command output:
      [ 2022-11-28 17:52:28 ] 23000 transcripts processed.
      [ 2022-11-28 17:52:53 ] 24000 transcripts processed.
      [ 2022-11-28 17:53:16 ] 25000 transcripts processed.
      [ 2022-11-28 17:53:40 ] 26000 transcripts processed.
      [ 2022-11-28 17:53:57 ] 27000 transcripts processed.
      (wrapper_remote pid=3216) [ 2022-11-28 17:54:10 ] Exception raised from CIRC-ENST00000398052.9-E2-E3-E4-E5-E6-E7-E8
      Traceback (most recent call last):
        File "/usr/local/bin/moPepGen", line 8, in <module>
          sys.exit(main())
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/__main__.py", line 89, in main
          args.func(args)
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 351, in call_variant_peptide
          results = ray.get([wrapper_remote.remote(d) for d in dispatches])
        File "/usr/local/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
          return func(*args, **kwargs)
        File "/usr/local/lib/python3.8/site-packages/ray/_private/worker.py", line 2289, in get
          raise value.as_instanceof_cause()
      ray.exceptions.RayTaskError(ValueError): ray::wrapper_remote() (pid=3216, ip=172.17.0.4)
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 248, in wrapper_remote
          return call_variant_peptides_wrapper(*dispatch)
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 232, in call_variant_peptides_wrapper
          _peptides = call_peptide_circ_rna(
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 485, in call_peptide_circ_rna
          return pgraph.call_variant_peptides(blacklist=ref.canonical_peptides, circ_rna=record)
        File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 788, in call_variant_peptides
          self.call_and_stage_unknown_orf(
        File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 1053, in call_and_stage_unknown_orf
          if orf_i_2.node_is_at_least_one_loop_downstream(
        File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PVGOrf.py", line 144, in node_is_at_least_one_loop_downstream
          raise ValueError("Failed to find a non circRNA variant.")
      ValueError: Failed to find a non circRNA variant.

    Work dir:
      work/5a/dbbac3b33bae51823cd6a09efffc59

    Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

GVFs:

/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/CIRCexplorer2/CPCG0184_circularRNA_known.txt.1.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/Fusion/star-fusion-1.9.1/CPCG0184.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/REDItools/CPCG0184_candidates.rmsk.GRCh38_annotated.txt.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/RMATS/CPCG0184_ijc5_sjc5.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gindel/CPCG0184.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gsnp/CPCG0184.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/pindel/CPCG0184.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/somaticsniper/CPCG0184.gencode.tsv.s.gvf
zhuchcn commented 1 year ago

The cause of this issue is when a DNA node is translated into a peptide node, if it spans over a junction site, the location of the sequence after junction site may get lost.

lydiayliu commented 1 year ago

Should we do more fuzz tests on circRNAs? I feel like that's where the bugs are still coming from

zhuchcn commented 1 year ago

I'll submit a big batch of fuzz test. The current fuzz test setup only uses one gene as template. Maybe we should also fuzz the gene template.