Open SimonaSecomandi opened 3 years ago
@glennhickey What's the current recommended pruning workflow for squashing a generic Cactus graph into something simple enough to make a GCSA index out of?
I will punt this to @jmonlong, who's just gotten this working on our latest graphs. I think -k 64
is one of the key things.
I've had some success by using a higher kmer size during the pruning. I didn't change the default kmer size used for the gcsa indexing. I ran something like:
vg prune -k 45 graph.vg > graph.pruned.vg
mkdir -p tmp_gcsa
vg index --temp-dir tmp_gcsa -p -g graph.gcsa graph.pruned.vg
In practice, I had first decomposed the graph in components with vg chunk
, prune each graph separately and then fed all these pruned graph to vg index
to minimize memory usage.
Pruning with kmer size 45 and 64 reduced the memory usage enough that I could gcsa index a human-size cactus graph with 100-200Gb of memory.
I hope this helps.
Thank you for your support! I generated the index with those commands:
vg prune -t 32 -k 45 pangenome.vg >pangenome_pruned.vg
vg mod -t 32 -X 256 pangenome_pruned.vg > pangenome_pruned_mod.vg
vg index -t 32 -b /tmp -p -x pangenome_pruned_mod_mod.xg -g pangenome_pruned_mod.gcsa pangenome_pruned_mod.vg
You only want to use prune
for the gcsa
index. Your commands should look like
vg mod -X 256 pangenome.vg > pangenome_choppped.vg
vg index pangenome_chopped.vg -x pangenome_chopped.xg
vg prune -t 32 -k 45 pangenome_chopped.vg > pangenome_pruned.vg
vg index -t 32 -b /tmp -p -g pangenome_pruned_mod.gcsa pangenome_pruned.vg
Hello, I managed to index, prune and align my illumina WGS data against the variation graph.
vg mod -t 32 -X 256 .pangenome.vg > pangenome_chopped.vg
vg index -t 32 -x pangenome_chopped.xg pangenome_chopped.vg
vg prune -t 32 -k 45 pangenome_chopped.vg > pangenome_chopped_pruned.vg
vg index -t 32 -b /tmp -p -g pangenome_chopped_pruned.gcsa pangenome_chopped_pruned.vg
vg map -t 32 -f WGS_forward.fastq.gz -f WGS_reverse.fastq.gz -x pangenome_chopped.xg -g pangenome_chopped_pruned.gcsa > WGS_aln.gam
The next step would be the augmentation step right? Which of the generate vg files I have to use? The original, the chopped or the pruned?
e.g. vg augment pangenome.vg WGS_aln.gam -A WGS_aug.gam > WGS_aug.vg
I'm struggling figuring our the next steps for variant calling in general..
1. What were you trying to do?
Call variants with vg. I'm trying to align of paired-end illumina reads from different bird individuals to a vg graph generated with the Cactus Pangenome Pipeline (https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/pangenome.md). The vg file comprires 10 high-quality assemblies (primary and alternate) of 5 individuals of the same species.
2. What did you want to happen?
Create index with 16 kmers for vg map.
3. What actually happened?
error: [write_gcsa_kmers()] size limit exceeded
4. If you got a line like
Stack trace path: /somewhere/on/your/computer/stacktrace.txt
, please copy-paste the contents of that file here:No stacktrace.
5. What data and command can the vg dev team use to make the problem happen?
vg index -t 32 -x pangenome_k16.xg -g pangenome_k16.gcsa -k 16 -b pangenome.vg
6. What does running
vg version
say?Many thanks!