Closed jespindel01 closed 1 month ago
The GBZ graph you built manually is not the same graph as the one you indexed using vg autoindex
. For various practical reasons, long nodes must be chopped into shorter fragments before they can be used in vg. vg autoindex
chops the nodes to 32 bp, while the vg gbwt
default is 1024 bp.
You should have a GBZ graph (probably index.giraffe.gbz
) from vg autoindex
which you can use. However, because you built the graph with PGGB, its structure could be too complex and Giraffe might be slow.
Additionally, your two vg gbwt
commands run the same (potentially expensive) algorithm with different outputs. If you need a separate GBWT file, you can extract it much faster from the GBZ with vg gbwt -o graph.gbwt -Z graph.gbz
.
Thank you
1. What were you trying to do? I'm trying to align paired end fastq short reads to a single chromosome pangenome, using vg giraffe. In trying to troubleshoot, I changed the number of threads to 1 to ensure I was not running out of memory, and still got the error. I also confirmed by watching memory usage that the error occurred before even 10% of the memory was used. Here is an example call that caused the crash:
2. What did you want to happen? I wanted to get the read alignments to the pangenome. A small number of commands in which I passed only a single fastq file ran to completion, but for the vast majority of samples, vg crashes with the stack trace below. Every time I try and pass two fastq files for paired end data, I get the same vg crash message.
3. What actually happened? vg crashed
4. If you got a line like
Stack trace path: /somewhere/on/your/computer/stacktrace.txt
, please copy-paste the contents of that file here:5. What data and command can the vg dev team use to make the problem happen? to get the inputs to vg, I started with a .gfa file output from pggb - I ran the following to get the inputs for vg giraffe:
For the fastq inputs, the files are simple, paired end short read fastq files that have been filtered so they do not contain every read off the sequencer. The format adheres to standard fastq.
6. What does running
vg version
say?