vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 193 forks source link

key not present #3786

Open Kongqq5 opened 1 year ago

Kongqq5 commented 1 year ago

Hello, I want to construct a pantranscriptome with two haplotypes in a diploid. 1.What I have done

# Construct variation graph
wfmash -t 72 -p 90 -s 100000 ./BM#1#chr4_RagTag.fa.gz ./BM#2#chr4_RagTag.fa.gz  > ./chr4_RagTag.aln.paf
cat BM#1#chr4_RagTag.fa.gz ./BM#2#chr4_RagTag.fa.gz  > chr4_RagTag.pggb.input.fa.gz 
seqwish -t 72 -s chr4_RagTag.pggb.input.fa.gz -p chr4_RagTag.aln.paf -g chr4_RagTag.aln.gfa
smoothxg -t 72 -g 16.chr4_RagTag.aln.gfa -V -o chr4_RagTag.smoothxg.aln.gfa
# Convert variation graph to PackedGraph format
vg convert -g chr4_RagTag.smoothxg.aln.gfa > chr4_RagTag.pg
grep -P 'chr4_RagTag' hap1.final_annotation1.gff > chr4_RagTag.hap1.gff
grep -P 'chr4_RagTag' hap2.final_annotation1.gff > chr4_RagTag.hap2.gff

Works fine 2. What data and command can the vg dev team use to make the problem happen?

vg rna -k 256 -y CDS -s Parent -p -t 72 -n chr4_RagTag.hap1.gff -n chr4_RagTag.hap2.gff -r -u -b chr4_RagTag.pantranscriptome.gbwt -i chr4_RagTag.pantranscriptome.txt chr4_RagTag.pg > split.chr4_RagTag.pg

3. What actually happened? ··· [vg rna] Parsing graph file ... [vg rna] Graph parsed in 8.25311 seconds, 1.80941 GB [vg rna] Adding transcript splice-junctions and exon boundaries to graph ... [vg rna] 14015 transcripts parsed and graph augmented in 67.9664 seconds, 14.4324 GB [vg rna] Chopping long nodes ... terminate called after throwing an instance of 'std::out_of_range' what(): at: key not present ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug. Stack trace path: /tmp/vg_crash_Ko8HSO/stacktrace.txt Please include the stack trace file in your bug report! ···

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Crash report for vg v1.42.0 "Obolo"
Stack trace (most recent call last):
#17   Object "/data13/kongqq/software/env/vg/bin/vg", at 0x602fa4, in _start
#16   Object "/data13/kongqq/software/env/vg/bin/vg", at 0x1fbe506, in __libc_start_main
#15   Object "/data13/kongqq/software/env/vg/bin/vg", at 0x1fbcca9, in __libc_start_call_main
#14   Object "/data13/kongqq/software/env/vg/bin/vg", at 0xdc215b, in vg::subcommand::Subcommand::operator()(int, char**) const
#13   Object "/data13/kongqq/software/env/vg/bin/vg", at 0xd947c0, in main_rna(int, char**)
#12   Object "/data13/kongqq/software/env/vg/bin/vg", at 0x12e864b, in vg::Transcriptome::chop_nodes(unsigned int)
#11   Object "/data13/kongqq/software/env/vg/bin/vg", at 0x175d585, in bdsg::HashGraph::for_each_handle_impl(std::function<bool (handlegraph::handle_t const&)> const&, bool) const
#10   Object "/data13/kongqq/software/env/vg/bin/vg", at 0x12f2f7f, in _ZNSt17_Function_handlerIFbRKN11handlegraph8handle_tEEZNS0_20BoolReturningWrapperIZN2vg13Transcriptome10chop_nodesEjEUlS3_E_Lb0EE4wrapERKS8_EUlDpOT_E_E9_M_invokeERKSt9_Any_dataS3_
#9    Object "/data13/kongqq/software/env/vg/bin/vg", at 0x1762474, in bdsg::HashGraph::get_length(handlegraph::handle_t const&) const
#8    Object "/data13/kongqq/software/env/vg/bin/vg", at 0x176bd4e, in spp::sparse_hash_map<long long, bdsg::HashGraph::node_t, bdsg::wang_hash<long long, void>, std::equal_to<long long>, spp::libc_allocator<std::pair<long long const, bdsg::HashGraph::node_t> > >::at(long long const&) const
#7    Object "/data13/kongqq/software/env/vg/bin/vg", at 0x42b188, in void spp::throw_exception<std::out_of_range>(std::out_of_range const&)
#6    Object "/data13/kongqq/software/env/vg/bin/vg", at 0x1ef6958, in __cxa_throw
#5    Object "/data13/kongqq/software/env/vg/bin/vg", at 0x1ef67f6, in std::terminate()
#4    Object "/data13/kongqq/software/env/vg/bin/vg", at 0x1ef678b, in __cxxabiv1::__terminate(void (*)())
#3    Object "/data13/kongqq/software/env/vg/bin/vg", at 0x5d0ad9, in __gnu_cxx::__verbose_terminate_handler() [clone .cold]
#2    Object "/data13/kongqq/software/env/vg/bin/vg", at 0x5d3221, in abort
#1    Object "/data13/kongqq/software/env/vg/bin/vg", at 0x1fd56e5, in raise
#0    Object "/data13/kongqq/software/env/vg/bin/vg", at 0x20020ec, in __pthread_kill

I tried the same command with other chromosomes and they all worked fine, but I don't know why there is such a problem in Chr3 and Chr4.

5. What does running vg version say?

vg version v1.42.0 "Obolo"
Compiled with g++ (Ubuntu 11.2.0-19ubuntu1) 11.2.0 on Linux
Linked against libstd++ 20220324
Built by ubuntu@ip-172-31-12-246

My gff files

BM#1#chr4_RagTag#0      GAF     gene    240602  241541  .       -       .       ID=gene_17876;transcripts=1;complete=1;maxEvidence=1;combinedEvidence=1
BM#1#chr4_RagTag#0      GeMoMa  mRNA    240602  241541  .       -       .       ID=Mo17_Zm00014a034051_T001_R3;ref-gene=Mo17_Zm00014a034051;aa=247;score=524;ce=3;rce=3;pAA=0.5;iAA=0.4796;nps=0;start=M;stop=*;e>
BM#1#chr4_RagTag#0      GeMoMa  CDS     241437  241541  .       -       0       Parent=Mo17_Zm00014a034051_T001_R3
BM#1#chr4_RagTag#0      GeMoMa  CDS     241201  241328  .       -       0       Parent=Mo17_Zm00014a034051_T001_R3
BM#1#chr4_RagTag#0      GeMoMa  CDS     240602  241109  .       -       1       Parent=Mo17_Zm00014a034051_T001_R3
BM#1#chr4_RagTag#0      GAF     gene    254736  255681  .       +       .       ID=gene_0;transcripts=1;complete=1;maxEvidence=1;combinedEvidence=1
BM#1#chr4_RagTag#0      GeMoMa  mRNA    254736  255681  .       +       .       ID=B73_rna-XM_020540145.1_R3;ref-gene=B73_gene-LOC109940577;aa=289;score=1508;ce=2;rce=2;pAA=0.9862;iAA=0.9827;nps=0;start=M;stop>
##sequence-region#0     BM#2#chr4_RagTag        1       249377254                                       
BM#2#chr4_RagTag#0      GAF     gene    139654  142755  .       -       .       ID=gene_32557;transcripts=1;complete=1;maxEvidence=1;combinedEvidence=1
BM#2#chr4_RagTag#0      GeMoMa  mRNA    139654  142755  .       -       .       ID=Mo17_Zm00014a000194_T001_R1;ref-gene=Mo17_Zm00014a000194;aa=240;score=1191;ce=6;rce=6;pAA=0.9875;iAA=0.9833;nps=0;start=M;stop>
BM#2#chr4_RagTag#0      GeMoMa  CDS     142575  142755  .       -       0       Parent=Mo17_Zm00014a000194_T001_R1
BM#2#chr4_RagTag#0      GeMoMa  CDS     141807  141913  .       -       2       Parent=Mo17_Zm00014a000194_T001_R1
BM#2#chr4_RagTag#0      GeMoMa  CDS     140634  140734  .       -       0       Parent=Mo17_Zm00014a000194_T001_R1
BM#2#chr4_RagTag#0      GeMoMa  CDS     140073  140151  .       -       1       Parent=Mo17_Zm00014a000194_T001_R1
BM#2#chr4_RagTag#0      GeMoMa  CDS     139901  139948  .       -       0       Parent=Mo17_Zm00014a000194_T001_R1
BM#2#chr4_RagTag#0      GeMoMa  CDS     139654  139857  .       -       0       Parent=Mo17_Zm00014a000194_T001_R1
BM#2#chr4_RagTag#0      GAF     gene    237522  237587  .       +       .       ID=gene_35872;transcripts=1;complete=1;maxEvidence=1;combinedEvidence=1
jonassibbesen commented 1 year ago

I am not sure why this error is happening. Would you be able to share the data? You can use this email: jonas.sibbesen@sund.ku.dk