vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

vg map errors #4280

Closed santhanakrishnanb closed 1 month ago

santhanakrishnanb commented 1 month ago

1. What were you trying to do? 2. What did you want to happen?

I was trying to map a new fastq file to a graph generated using a few genome sequences.

Step 1: vg construct -r Ref_genes.fasta > output.vg was succesful.

Step 2: vg index -x output.xg -g output.gcsa output.vg was succesful.

Step 3: vg map -x output.xg -g output.gcsa -f CP019206.1.fastq > mapped_reads.gam is where it got stuck.

3. What actually happened?

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

sb3700@cvm-Lambda-Vector: vg map -x output.xg -g output.gcsa -f trimmed_CP045063.1.fastq > mapped_reads.gam

terminate called after throwing an instance of 'std::runtime_error' what(): Found unexpected delimiter in fastq/fasta input ━━━━━━━━━━━━━━━━━━━━ Crash report for vg v1.56.0 "Collalto" Stack trace (most recent call last) in thread 2077364:

14 Object "", at 0xffffffffffffffff, in

13 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7f719a52684f, in __clone3

  Source "../sysdeps/unix/sysv/linux/x86_64/clone3.S", line 81, in __clone3 [0x7f719a52684f]

12 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7f719a494ac2, in start_thread

  Source "./nptl/pthread_create.c", line 442, in start_thread [0x7f719a494ac2]

11 Object "/usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0", at 0x7f719abdbc0d, in

10 Object "/home/sb3700/Adam_Data/pggb/vg/bin/vg", at 0x56289717fb29, in unsigned long vg::io::unpaired_for_each_parallel(std::function<bool (vg::Alignment&)>, std::function<void (vg::Alignment&)>, unsigned long) [clone ._omp_fn.0]

| Source "/home/sb3700/Adam_Data/pggb/vg/include/vg/io/alignment_io.hpp", line 146, in operator()
|   144:             for (int i = 0; i < batch_size; i++) {
|   145:                 
| > 146:                 more_data = get_read_if_available(aln);
|   147:                 
|   148:                 if (more_data) {
  Source "/usr/include/c++/11/bits/std_function.h", line 590, in _ZN2vg2io26unpaired_for_each_parallelINS_9AlignmentEEEmSt8functionIFbRT_EES3_IFvS5_EEm._omp_fn.0 [0x56289717fb29]
    587:       {
    588:    if (_M_empty())
    589:      __throw_bad_function_call();
  > 590:    return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
    591:       }
    592: 
    593: #if __cpp_rtti

9 Object "/home/sb3700/Adam_Data/pggb/vg/bin/vg", at 0x5628967de82c, in vg::get_next_alignment_from_fastq(gzFile_s, char, unsigned long, vg::Alignment&) [clone .cold]

| Source "src/alignment.cpp", line 220, in ~basic_string
| Source "/usr/include/c++/11/bits/basic_string.h", line 672, in ~_Alloc_hider
|   670:        */
|   671:       ~basic_string()
| > 672:       { _M_dispose(); }
|   673: 
|   674:       /**
| Source "/usr/include/c++/11/bits/basic_string.h", line 158, in ~allocator
|   157:       // Use empty-base optimization: http://www.cantrip.org/emptyopt.html
| > 158:       struct _Alloc_hider : allocator_type // TODO check __is_final
|   159:       {
|   160: #if __cplusplus < 201103L
| Source "/usr/include/c++/11/bits/allocator.h", line 174, in ~new_allocator
|   172:       constexpr
|   173: #endif
| > 174:       ~allocator() _GLIBCXX_NOTHROW { }
|   175: 
|   176: #if __cplusplus > 201703L
  Source "/usr/include/c++/11/ext/new_allocator.h", line 89, in get_next_alignment_from_fastq [0x5628967de82c]
     86:    new_allocator(const new_allocator<_Tp1>&) _GLIBCXX_USE_NOEXCEPT { }
     87: 
     88: #if __cplusplus <= 201703L
  >  89:       ~new_allocator() _GLIBCXX_USE_NOEXCEPT { }
     90: 
     91:       pointer
     92:       address(reference __x) const _GLIBCXX_NOEXCEPT

8 Object "/usr/lib/x86_64-linux-gnu/libgcc_s.so.1", at 0x7f719abb52dc, in _Unwind_Resume

7 Object "/usr/lib/x86_64-linux-gnu/libgcc_s.so.1", at 0x7f719abb4883, in

6 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f719a8ad958, in __gxx_personality_v0

5 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f719a8ad1e8, in

4 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f719a8ae20b, in

3 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f719a8a2b9d, in

2 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7f719a4287f2, in abort

  Source "./stdlib/abort.c", line 79, in abort [0x7f719a4287f2]

1 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7f719a442475, in raise

  Source "../sysdeps/posix/raise.c", line 26, in raise [0x7f719a442475]

0 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7f719a4969fc, in pthread_kill@@GLIBC_2.34

| Source "./nptl/pthread_kill.c", line 89, in __pthread_kill_internal
| Source "./nptl/pthread_kill.c", line 78, in __pthread_kill_implementation
  Source "./nptl/pthread_kill.c", line 44, in __pthread_kill [0x7f719a4969fc]

ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug. Please include this entire error log in your bug report!



**5. What data and command can the vg dev team use to make the problem happen?**
Ref genome used the following: CP019405.1, CP019409.1, CP019410.1, CP019412.1, CP019413.1, CP019414.1, CP019417.1

**6. What does running `vg version` say?**

vg version v1.56.0 "Collalto"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Built by sb3700@cvm-Lambda-Vector
jeizenga commented 1 month ago

That looks like a FASTQ formatting error. Can you share the results of head -n 20 trimmed_CP045063.1.fastq?

santhanakrishnanb commented 1 month ago

LOCUS CP045063 4930420 bp DNA circular BCT 12-NOV-2019 DEFINITION Salmonella enterica subsp. enterica serovar Muenchen strain LG26 chromosome, complete genome. ACCESSION CP045063 VERSION CP045063.1 DBLINK BioProject: PRJNA576706 BioSample: SAMN13002973 KEYWORDS . SOURCE Salmonella enterica subsp. enterica serovar Muenchen ORGANISM Salmonella enterica subsp. enterica serovar Muenchen Bacteria; Pseudomonadota; Gammaproteobacteria; Enterobacterales; Enterobacteriaceae; Salmonella. REFERENCE 1 (bases 1 to 4930420) AUTHORS Tran,T.D., McGarvey,J.A., Huynh,S. and Parker,C.T. TITLE Genome sequence of Salmonella Muenchen str. LG26 JOURNAL Unpublished REFERENCE 2 (bases 1 to 4930420) AUTHORS Tran,T.D., McGarvey,J.A., Huynh,S. and Parker,C.T. TITLE Direct Submission JOURNAL Submitted (09-OCT-2019) Foodborne Toxin Detection and Prevention

jeizenga commented 1 month ago

Okay, yeah, this is not at all a FASTQ file. It looks like maybe you saved a request for a FASTQ file instead of the file itself?

santhanakrishnanb commented 1 month ago

yeah, this is not at all a FASTQ file. It looks like maybe you saved a request for a FASTQ file instead of the file itself?

The above output is the header to it. Attached is the complete file. I have tried with other fastq files downloaded from NCBI, but with similar results.

CP045063.1.zip

jeizenga commented 1 month ago

That isn't a FASTQ either. These seem to be raw NCBI data pages. Check out the wiki to see some examples of what FASTQ files look like: https://en.wikipedia.org/wiki/FASTQ_format