Closed lydiayliu closed 2 years ago
0244525 I'm trying with this commit
it's strange though cuz no message since loading all the files and it's been a few hours
I have no name!@ed2e81cb1a85:/$ moPepGen callVariant \
> --input-variant /data/Parser/Fusion/arriba-2.1.0/${c}.winu.gvf \
> /data/Parser/VEP/gencode/gsnp/${b} \
> /data/Parser/VEP/gencode/gindel/${b} \
> /data/Parser/VEP/gencode/somaticsniper/${b} \
> /data/Parser/VEP/gencode/pindel/${b} \
> --index-dir /data/Index/GRCh38-EBI-GENCODE34/ \
> --output-fasta /data/Variant/Fusion/arriba-2.1.0/ssm/${c}.3f.fasta
[ 2021-12-18 23:10:13 ] moPepGen callVariant started
[ 2021-12-18 23:11:43 ] Variant file /data/Parser/Fusion/arriba-2.1.0/CPCG0100.winu.gvf loaded.
[ 2021-12-18 23:18:22 ] Variant file /data/Parser/VEP/gencode/gsnp/CPCG0100.gencode.tsv.gvf loaded.
[ 2021-12-18 23:19:20 ] Variant file /data/Parser/VEP/gencode/gindel/CPCG0100.gencode.tsv.gvf loaded.
[ 2021-12-18 23:19:20 ] Variant file /data/Parser/VEP/gencode/somaticsniper/CPCG0100.gencode.tsv.gvf loaded.
[ 2021-12-18 23:19:20 ] Variant file /data/Parser/VEP/gencode/pindel/CPCG0100.gencode.tsv.gvf loaded.
had to kill it
^C[ 2021-12-19 01:01:45 ] Exception raised from ENST00000577672.1
Traceback (most recent call last):
File "/usr/local/bin/moPepGen", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/__main__.py", line 77, in main
args.func(args)
File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 127, in call_variant_peptide
peptides = call_peptide_main(
File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 214, in call_peptide_main
pgraph.create_cleavage_graph(rule=rule, exception=exception)
File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 485, in create_cleavage_graph
self.fit_into_cleavage_multiple_upstream(cur)
File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 529, in fit_into_cleavage_multiple_upstream
sites = cur.seq.find_all_cleave_and_stop_sites(rule=self.rule,
File "/usr/local/lib/python3.8/site-packages/moPepGen/aa/AminoAcidSeqRecord.py", line 213, in find_all_cleave_and_stop_sites
cleavage_sites = list(self.iter_enzymatic_cleave_sites(rule=rule,
KeyboardInterrupt
Can you try with --verbose-level 2
? By default it's 1 which doesn't print out tx IDs. It's probably running
I pushed a new commit to #289 last night. There seemed to be an issue. Could you try with the latest commit?
I am still seeing this, with a different transcript this time, commit ddaa904
[ 2021-12-20 03:38:06 ] 88000 transcripts processed
...
[ 2021-12-20 03:39:44 ] ENST00000449085.3
[ 2021-12-20 03:39:45 ] ENST00000607833.5
[ 2021-12-20 03:39:45 ] ENST00000464592.5
[ 2021-12-20 03:39:45 ] ENST00000463639.1
[ 2021-12-20 03:39:45 ] ENST00000469132.1
[ 2021-12-20 03:39:45 ] ENST00000229829.7
[ 2021-12-20 03:39:45 ] ENST00000490305.5
[ 2021-12-20 03:39:45 ] ENST00000419277.5
[ 2021-12-20 03:39:45 ] ENST00000479107.1
[ 2021-12-20 03:39:46 ] Exception raised from ENST00000479107.1
Traceback (most recent call last):
File "/usr/local/bin/moPepGen", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/__main__.py", line 77, in main
args.func(args)
File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 127, in call_variant_peptide
peptides = call_peptide_main(
File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 215, in call_peptide_main
return pgraph.call_variant_peptides(miscleavage=miscleavage)
File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 630, in call_variant_peptides
self.call_and_stage_unknown_orf(
File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 850, in call_and_stage_unknown_orf
traversal.pool.add_miscleaved_sequences(
File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/VariantPeptideDict.py", line 181, in add_miscleaved_sequences
raise ValueError('Invalid amino acid symbol found in the sequence.')
ValueError: Invalid amino acid symbol found in the sequence.
Hmm, I extracted the variants of this transcript from the GVF files and it seems to run successfully. 🤔
[ 2021-12-20 09:54:31 ] moPepGen callPeptides started
[ 2021-12-20 09:54:31 ] Annotation GTF loaded.
[ 2021-12-20 09:54:31 ] Proteome FASTA loaded.
[ 2021-12-20 09:54:31 ] Genome assembly FASTA loaded.
[ 2021-12-20 09:54:31 ] canonical peptide pool generated.
[ 2021-12-20 09:54:31 ] Variant file test/files/comb/CPCG0100_ENST00000623909.1/gsnp.gvf loaded.
[ 2021-12-20 09:54:31 ] Variant file test/files/comb/CPCG0100_ENST00000623909.1/gindel.gvf loaded.
[ 2021-12-20 09:54:31 ] Variant file test/files/comb/CPCG0100_ENST00000623909.1/arriba.gvf loaded.
[ 2021-12-20 09:54:31 ] Variant file test/files/comb/CPCG0100_ENST00000623909.1/pindel.gvf loaded.
[ 2021-12-20 09:54:31 ] Variant file test/files/comb/CPCG0100_ENST00000623909.1/somaticsniper.gvf loaded.
[ 2021-12-20 09:54:32 ] Variant peptide FASTA file written to disk.
ENST00000479107.1 ?
Oh wait, what was I doing..
I'm able to reproduce it. I think it's again a node being missed when creating the cleavage graph 😵
I think I fixed it in ddaa904 . Seems like under certain circumstances some nodes are visited multiple times. Potentially because of too many inframe mutations in a local region.
T_T got another transcript, past 88000 though!!!!
ENST00000460633.1
Yeah, 33819ca messed it up. It was fine on ddaa904
Think I fixed it. I'll add some test cases. Are you have to run the whole thing again? 😢
Sorry it's not completely fixed yet.
OK finally fixed in 06a8020
ok gonna run it overnight! fingers crossed! fe2f16e
no longer see this
this is the furthest we've gotten!!!