Closed esrice closed 1 year ago
Sorry, I think the issue is that the gfa version of the graph output by minigraph-cactus is not the same as the gbz version. Not an issue with vg.
IDs are different between GFA and GBZ (which is an ongoing source of confusion). Please see here for more information:
https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/pangenome.md#node-chopping
Thanks, got it.
1. What were you trying to do? Extract a subgraph from a larger graph, using this command:
2. What did you want to happen? I would expect that the nodes extracted for this region would be the same nodes referred to in the vcf for this region, e.g., the vg header contains this line:
and subsetting the vcf to the same region returns lines like this:
3. What actually happened?
vg find
returned a subgraph that does not contain nodes 47495243 and 47495248, but instead the nodes IDs are in the range 29961990-29981305. On examination of the subgraph structure in bandage, it does not appear that the issue is node IDs being shifted, but rather this is not the part of the graph covered bybGalGal1b#0#chrZ:11159196-11400464
as requested.4. If you got a line like
Stack trace path: /somewhere/on/your/computer/stacktrace.txt
, please copy-paste the contents of that file here:NA
5. What data and command can the vg dev team use to make the problem happen?
The gbz and vcf are direct output from the minigraph-cactus pipeline. I can share these files if necessary.
6. What does running
vg version
say?