vg autoindex error - Githubissues

wlhCNU commented 1 year ago

1. What were you trying to do? I am trying to build an index

2. What did you want to happen? vg autoindex --workflow giraffe -g graph_genome.gfa -p PN -t 8

3. What actually happened?

[vg autoindex] Executing command: vg autoindex --workflow giraffe -g graph_genome.gfa -p PN -t 8
[IndexRegistry]: Checking for haplotype lines in GFA.
[IndexRegistry]: Constructing VG graph from GFA input.
error:[IndexRegistry] Input GFA is not usable in VG.
GFA format error: On pass 1: On line 86233517: At column 51: Expected nonempty value while parsing path visits

the contents on line 86233517 is shown below: P _alt_23ea026290ef010bc817f51bb3d759808b67d03f_1

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Place stacktrace here.

5. What data and command can the vg dev team use to make the problem happen?

6. What does running vg version say?

vg version v1.48.0 "Gallipoli"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux
Linked against libstd++ 20210601
Built by ubuntu@ip-172-31-9-38

jeizenga commented 1 year ago

A P line in GFA is 4 tab-separated columns according to the spec, and this line only has 2 columns, so this GFA is indeed malformed. It looks to me like you generated the GFA from a graph made by vg construct -a, right? If so, you can get rid of these "alt" allele paths with vg paths -d -a before converting to GFA and you should be fine. The allele paths are really only meant for internal use in the VG indexing pipelines, so we didn't design them to be exported to valid GFA.

wlhCNU commented 1 year ago

A P line in GFA is 4 tab-separated columns according to the spec, and this line only has 2 columns, so this GFA is indeed malformed. It looks to me like you generated the GFA from a graph made by vg construct -a, right? If so, you can get rid of these "alt" allele paths with vg paths -d -a before converting to GFA and you should be fine. The allele paths are really only meant for internal use in the VG indexing pipelines, so we didn't design them to be exported to valid GFA.

Thank you for your prompt response. I generated the GFA from the input file graph_genome.vg made by vg view --threads 50 graph_genome.vg > graph_genome.gfa. If graph_genome.vg was used the input file, how can I converting it to a corrected formate GFA file for autoindex analysis. Thanks.

jeizenga commented 1 year ago

Use vg paths -d -a to remove the alt paths first, then convert it to GFA. It's probably better just to use the VCF and FASTA as inputs to vg autoindex though.

vgteam / vg

vg autoindex error #3970