uclahs-cds / package-moPepGen

Multi-Omics Peptide Generator
https://uclahs-cds.github.io/package-moPepGen/
GNU General Public License v2.0
5 stars 1 forks source link

arriba + variants key error and strange log #337

Closed lydiayliu closed 2 years ago

lydiayliu commented 2 years ago
a=/hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/gsnp/CPCG0324.gencode.tsv.s.gvf
b=$(basename -- "$a"); echo ${b};
c="${b%%.*}"; echo ${c};
moPepGen callVariant \
    --input-variant /hot/users/yiyangliu/MoPepGen/Parser/Fusion/arriba-2.1.0/${c}.s.gvf \
        /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/gsnp/${b} \
        /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/gindel/${b} \
        /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/somaticsniper/${b} \
        /hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/pindel/${b} \
    --index-dir /hot/users/yiyangliu/MoPepGen/Index/GRCh38-EBI-GENCODE34/ \
    --verbose-level 1 \
    --threads 16 \
    --output-fasta /hot/users/yiyangliu/MoPepGen/Variant/Fusion/arriba-2.1.0/ssm/${c}.fasta > /hot/users/yiyangliu/MoPepGen/Variant/Fusion/arriba-2.1.0/ssm/${c}.log
...
[ 2022-01-13 00:00:27 ] Exception raised from fusion FUSION-ENSG00000204177.10:11495-ENSG00000204179.10:53590
An error has occured during the function execution
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/ppft/__main__.py", line 111, in run
    __result = __f(*__args)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 191, in wrapper
    return call_variant_peptides_wrapper(*dispatch)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 160, in call_variant_peptides_wrapper
    _peptides = call_peptide_fusion(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 357, in call_peptide_fusion
    dgraph.create_variant_graph(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/ThreeFrameTVG.py", line 916, in create_variant_graph
    cursors = self.apply_fusion(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/ThreeFrameTVG.py", line 555, in apply_fusion
    insertion_variants = variant_pool.filter_variants(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/seqvar/VariantRecordPool.py", line 177, in filter_variants
    gene_id = self.anno.transcripts[tx_id].transcript.gene_id
KeyError: 'ENST00000506881.5'

please also check out the very end of the log message. I believe the error is reported on two threads, but there is a lot of strange characters ^@^@^@^@^@^@^@^@^@^@^ in the log...

Also with the new multiprocessing, error reporting always happens twice. The above error was written to the LOG file (so stdout), but on the terminal you also get this below (which is stderr). Is this split design intentional?

Traceback (most recent call last):
  File "/usr/local/bin/moPepGen", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/__main__.py", line 79, in main
    args.func(args)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 279, in call_variant_peptide
    for peptides in peptide_series:
TypeError: 'NoneType' object is not iterable
zhuchcn commented 2 years ago

My guess is the first error message is printed by the worker process/thread and the second is printed by the main thread. The binary might be because when data is pickled/unpicked and transferred between threads it gets somehow messed up.

zhuchcn commented 2 years ago

I opened an issue at uqfoundation/pathos#228

lydiayliu commented 2 years ago

Adding another case here for a=/hot/users/yiyangliu/MoPepGen/Parser/VEP/gencode/gsnp/CPCG0249.gencode.tsv.s.gvf

[ 2022-01-13 17:36:55 ] 16000 transcripts processed.
[ 2022-01-13 17:37:13 ] Exception raised from fusion FUSION-ENSG00000118260.15:47919-ENSG00000227308.2:22649
An error has occured during the function execution
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/ppft/__main__.py", line 111, in run
    __result = __f(*__args)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 191, in wrapper
    return call_variant_peptides_wrapper(*dispatch)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 160, in call_variant_peptides_wrapper
    _peptides = call_peptide_fusion(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 357, in call_peptide_fusion
    dgraph.create_variant_graph(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/ThreeFrameTVG.py", line 916, in create_variant_graph
    cursors = self.apply_fusion(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/ThreeFrameTVG.py", line 555, in apply_fusion
    insertion_variants = variant_pool.filter_variants(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/seqvar/VariantRecordPool.py", line 177, in filter_variants
    gene_id = self.anno.transcripts[tx_id].transcript.gene_id
KeyError: 'ENST00000607654.1'

same transcript is hit in 4 threads producing 4 errors, some with the strange symbols in between

zhuchcn commented 2 years ago

Case one (CPCG0324) seems also to be fixed by #339. Fun fact, for this fusion, the donor part has 1300 bases, the accepter's exonic sequence has 1710 bases, but the intronic region carried over from the accepter gene has 91093 bases 😂

zhuchcn commented 2 years ago

Case 2 also fixed!

lydiayliu commented 2 years ago

the intronic region carried over from the accepter gene has 91093 bases

lmao! it's these introns that are making fusion run time super slow lolll

gimme a sec to double check both of these!

lydiayliu commented 2 years ago

both cases confirmed resolved. wow #339 is the bomb