Open sloth-eat-pudding opened 1 year ago
The path names in that graph are not in any format VG supports by default:
sample#haplotype#contig#fragment
sample#haplotype#contig#interval
sample#haplotype#contig
(PanSN format)name#fragment
or name#interval
name
The regex for the third pattern matches the name, but the haplotype field cannot be parsed, because it's not an integer.
If all path names have the same pattern, you can specify it with options --path-regex
and --path-fields
. Unfortunately we do not expose the ability to specify multiple patterns at the moment.
1. What were you trying to do?
I was attempting to convert a pggb pangenome graph to gbz format for use in Giraffe.
2. What did you want to happen?
I wanted a direct conversion without issues.
3. What actually happened?
I encountered an error stating
what(): MetadataBuilder: Invalid haplotype field JAHBCA010000258.1
.4. If you got a line like
Stack trace path: /somewhere/on/your/computer/stacktrace.txt
, please copy-paste the contents of that file here:5. What data and command can the vg dev team use to make the problem happen?
Data: hprc-v1.0-pggb.gfa Command:
vg gbwt -G hprc-v1.0-pggb.gfa --gbz-format -g hprc-v1.0-pggb-all-gbwt.gbz
6. What does running
vg version
say?I suspect the issue originates from the incorrect regex on the P-line. In the hprc-v1.0-pggb file, the P-line contains additional MT information. The regex pattern is
(.*)#(.*)#(.*)
. So, when givenP HG00438#2#JAHBCA010000258.1#MT
, it splits it into[HG00438#2][JAHBCA010000258.1][MT]
. The second piece of information should be the haplotype. As a result, it attempts to convertJAHBCA010000258.1
into a number, causing the error. I found this regex pattern defined in/vg/deps/gbwtgraph/src/gfa.cpp
asconst std::string GFAParsingParameters::PAN_SN_REGEX = "(.*)#(.*)#(.*)";
. I hope this information is helpful to you.