vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

vg autoindex vg giraffe pggb gfa #4387

Open hgz2021 opened 1 week ago

hgz2021 commented 1 week ago

The vg version I use: version : v1.59.0 I used PGGB to construct a pan-genome (GFA); I want to rely on him to find the SNP of the sample; Here's my current script: vg autoindex -t 5 --workflow giraffe -r fasta -g pggb.gfa -p ./chr1_cute vg giraffe -p -t 5 -d dist -Z gbz -m min -f R1.fastq.gz -f R2.fastq.gz -N SRR -o BAM > SRR.bam The error warning:[vg::get_sequence_dictionary] No reference-sense paths available in the graph; falling back to generic paths. error:[vg::get_sequence_dictionary] No reference or non-alt-allele generic paths available in the graph! Here's what I would like to ask you: Does VG have any commands or parameters for adding a reference genome to a GFA file? At present, I want to find the SNPs in the second-generation samples through the pan-genome generated by pggb;; If you have any other way to do this, please get back to me. (MC I've used)

jltsiren commented 1 week ago

If your graph already has the paths you want to use as reference sequences and they all correspond to the same sample name, you can mark it as the reference sample using:

vg gbwt --set-reference SAMPLE --gbz-format -g output.gbz -Z graph.gbz

See https://github.com/vgteam/vg/wiki/Path-Metadata-Model and https://github.com/vgteam/vg/wiki/Changing-References for further documentation.

PGGB path naming conventions should be compatible with the vg path metadata model. If not, you may have to change the names of the paths in the GFA file. And if you don't have the reference sequences already in the graph, you need to include them in the inputs to PGGB.