pangenome / pggb

the pangenome graph builder
https://doi.org/10.1101/2023.04.05.535718
MIT License
346 stars 37 forks source link

PGGB singularity vg deconstruct not recognizing sample prefix #399

Open SimonaSecomandi opened 1 month ago

SimonaSecomandi commented 1 month ago

Hi all,

I am trying to call variants using the pggb singularity pipeline starting from multiple haplotype-resolved genomes. I used PanSN-spec naming and it all worked well until I had to chose the reference for graph decomposition.

My path names are similar to these:

ind1#1#chr1
ind1#2#chr1
ind2#1#chr1
ind2#2#chr1
...
ind10#1#chr1
ind10#2#chr1

I would like to deconstruct the graph using ind1#1 as a reference, since it's the most complete genome and I used it for linear short-read mapping.

However, when choosing a reference in pggb singularity (pggb 87510bc):

The command used in the pggb runs is: vg deconstruct -P ind1#2# -H # -e -a -t 32 chr1.smooth.final.gfa and this is an example error:

[vg::deconstruct] making VCF with reference=ind1#1# and delim=# xxxxxxxxxxxxx ind1#1# ------------ 0
Error [vg deconstruct]: No specified reference path or prefix found in graph

I can successfully run vg deconstruct using ind1#1 as a ref if I do it after pggb graph construction, removing -H #: vg deconstruct -P ind1#1 -e -a -t 32 ch1.smooth.final.gfa.

Do you have any recommendation on how to specify the haplotype/sample I want to call the variants against to directly in pggb?

Many thanks!

Simona