Kitonick79 commented 8 years ago

As visualization is not straightforward we strongly feel that it needs prototyping. Thus, we have decided to implement it as plugin for IGV.

We need to make some architectural decisions, so we need your advice

We need somehow bind reference to the variation graph, because we don't know from GFA file relative position of the graph to the reference. We propose adding this information to the GFA file. Would you, please, consider adding this information? We would prefer every sequence has its position marked according to reference.
We are implementing index to a GFA file in order to navigate it quickly. We plan to build this index by separate tool (add to IGV tools) and put it alongside GFA file and check if it is present in the same folder. Do you agree with that approach or would you prefer adding index to GFA and rename it somehow?

We have also encountered a problem with VCF interpretation: VG tool doesn't process correctly multiple base polymorphism.

Consider the following VCF line:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

ref 7 . TTA GGG,CCC 0 . We expect to see three alleles in the result graph for such variation, but result looks in the PNG file attached.

multiple alleles

We can see edges between nodes 2:T and 6:C, but according to the VCF file such path is impossible.

Commands to reproduce the result: vg construct -r toy.fa -v toy_test.vcf.gz > toy_test.vg vg view -d toy_test.vg > toy_test.dot All test files are available in the attached archive.

vg test.tar.gz

Linearization

We have read the paper, vg code and made some experiments. I would like to refine the scope. You have sorting algorithm in place (a bidirected adaptation of Kahn's topological sort), which is good enough (linear in time etc.) for processing polymorphisms, deletions and insertions, Proposed algorithm of max-flow/min-cut algorithm is required if parsing of VCF is to get advanced to handle translocations and sentinel telomeres in order to produce haplotype graph.

• Should we implement the advanced VCF parsing, adding sentinel telomeres and topological sorting or just replace the old algorithm by new one? • Should we replace algorithm in vg code or implement it as standalone command line utility? •

We discovered that there is eight different paths in cactus example which have identifiers, whose meaning is unclear to us.

• Could you please clarify their meaning and give a hint what the path is? Moreover, some of them start from node number 5000. • Since vg produces a sorted graph, what is the purpose mod utilities? • Did you consider using some libraries for large graph processing (e.g. https://github.com/GraphChi/graphchi-cpp)?)

ekg commented 8 years ago

A heads up for other vg devs: This is a replay of an email thread. I've got a response coming.

On Fri, Jul 1, 2016, 12:24 Kitonick79 notifications@github.com wrote:

As visualization is not straightforward we strongly feel that it needs prototyping. Thus, we have decided to implement it as plugin for IGV.

We need to make some architectural decisions, so we need your advice

We need somehow bind reference to the variation graph, because we don't know from GFA file relative position of the graph to the reference. We propose adding this information to the GFA file. Would you, please, consider adding this information? We would prefer every sequence has its position marked according to reference.

We are implementing index to a GFA file in order to navigate it quickly. We plan to build this index by separate tool (add to IGV tools) and put it alongside GFA file and check if it is present in the same folder. Do you agree with that approach or would you prefer adding index to GFA and rename it somehow?

We have also encountered a problem with VCF interpretation: VG tool doesn't process correctly multiple base polymorphism.

Consider the following VCF line:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

ref 7 . TTA GGG,CCC 0 . We expect to see three alleles in the result graph for such variation, but result looks in the PNG file attached.

[image: multiple alleles] https://cloud.githubusercontent.com/assets/7396794/16520216/63efb416-3f97-11e6-8deb-2f0cf3eb6995.png

We can see edges between nodes 2:T and 6:C, but according to the VCF file such path is impossible.

Commands to reproduce the result: vg construct -r toy.fa -v toy_test.vcf.gz > toy_test.vg vg view -d toy_test.vg > toy_test.dot All test files are available in the attached archive.

vg test.tar.gz https://github.com/vgteam/vg/files/343343/vg.test.tar.gz

Linearization

We have read the paper, vg code and made some experiments. I would like to refine the scope. You have sorting algorithm in place (a bidirected adaptation of Kahn's topological sort), which is good enough (linear in time etc.) for processing polymorphisms, deletions and insertions, Proposed algorithm of max-flow/min-cut algorithm is required if parsing of VCF is to get advanced to handle translocations and sentinel telomeres in order to produce haplotype graph.

• Should we implement the advanced VCF parsing, adding sentinel telomeres and topological sorting or just replace the old algorithm by new one? • Should we replace algorithm in vg code or implement it as standalone command line utility? •

We discovered that there is eight different paths in cactus example which have identifiers, whose meaning is unclear to us.

• Could you please clarify their meaning and give a hint what the path is? Moreover, some of them start from node number 5000. • Since vg produces a sorted graph, what is the purpose mod utilities? • Did you consider using some libraries for large graph processing (e.g. https://github.com/GraphChi/graphchi-cpp)?)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/411, or mute the thread https://github.com/notifications/unsubscribe/AAI4EXhMx2sRcirwImu5COAqTB3druPrks5qRPjrgaJpZM4JDDs5 .

Kitonick79 commented 8 years ago

Dmitrii (aka Kitonick79) As visualization is not straightforward we strongly feel that it needs prototyping. Thus, we have decided to implement it as plugin for IGV.

Erik I'd love to know how you plan to do this. It is not clear to me what the advantage of IGV would be here as we will not use any of the same data types and the visualization would need to be completely different or only support bubble-DAGs. That said I do not know how the plugins work and if we could shim something in it might be very useful in the case that the graph is only made from a VCF file. This is less attractive to me because one objective of this project is to move away from assumptions that the VCF file requires.

Dmitrii The idea behind using IGV as a platform for prototyping of visualization is that its wide spread among bioinformation community, and has haplotype reference track out of the box along with VCF one. It seems to us, that topologically sorted graph won't be very useful for bio people. I don't understand for now how aligment should look like, or how actual pathes can be seen by end user from a bunch of joins. So, my thought was we can visualize linear side graph, which is visually more informative than linearized one. We can implement visualization of the linearizes DAG as standalone application, it is easier for us. Moreover, team feels it is possible to deliver first implementation at 7/11. I see the IGV implementation as a prototype for gathering feedback from community, after that we would like to implement it in our web-based browser. I understand that final goal is eliminating VCF and haplotype reference and work with the graph as a reference one, but it is not feasible now, and IGV plugin is just a prototype.

Kitonick79 commented 8 years ago

Dmitrii We are implementing index to a GFA file in order to navigate it quickly. We plan to build this index by separate tool (add to IGV tools) and put it alongside GFA file and check if it is present in the same folder. Do you agree with that approach or would you prefer adding index to GFA and rename it somehow?

Erik The xg index allows constant-time determination of the reference path-relative position of any position on any node in the graph. The indexed graph takes up only 2-3x the space of the compressed graph on both disk and in memory, so this is a reasonable solution for graphs of the entire 1000GP dataset. Furthermore, it can handle many thousands of haplotypes stored using gPBWT. Although these cannot be used for efficient positional queries, they can be used to efficiently determine haplotype frequencies. This is a hard problem and it has already taken us more than a year to complete its implementation, so we would be more than happy to work with you to extend it rather than introduce a new BAM-style independent index.

Here is an example of how you generate and use the xg index from the vg CLI API. To follow, execute these commands in the vg/test directory:

# build a "single-base" .vg from a tiny example VCF and reference
/# this is done for exposition, as it makes the behavior of vg find clearer
vg construct -r tiny/tiny.fa -v tiny/tiny.vcf.gz -m 1 >tiny.vg
/# now we generate the xg index
vg index -x tiny.xg tiny.vg

# finally, we get the region of the graph one step from the reference path
# from position 10 to 20 and visualize it with dot
vg find -p x:10-20 -x tiny.xg -c 1 | vg view -dp - | dot -Tpdf -o x.pdf

An aside about GFA:

GFA is fine for interchange, but I am uncomfortable using flat files as a basis for many of the things we are doing because it would seem they will have trouble handling cyclic and inverting graphs with long range connections. Furthermore, GFA lacks many basic features, such as the representation of genotypes. There is not consensus in the community of GFA developers how to represent paths, which are one of the core components of the graphs we are working on as well as alignments, haplotypes, genotypes, and genome annotations.

I suggest targeting the protobuf or JSON versions of the VG schema. We can efficiently convert from GFA into these and they provide a number of advantages. The full schema is defined in about 200 LOC including comments: https://github.com/vgteam/vg/blob/master/src/vg.proto. Reading and writing is defined in the stream header class: https://github.com/vgteam/vg/blob/master/src/stream.hpp. It should be trivial to implement readers for any other programming language with protobuf support, but some code will probably need to be written to handle the chunking that we do. Please let me know if I can help.

Dmitrii Erik, thank you for the guidance on the xg index and GFA format. We will use vg protobuf instead.

ekg commented 8 years ago

Thanks @Kitonick79! You've beat me to the paste-in.

Here's the last bit of the response:

We have also encountered a problem with VCF interpretation: VG tool doesn't process correctly multiple base polymorphism.

Just to confirm, I get the same result as you do with the vg construct defaults:

vg construct -r toy.fa -v toy_test.vcf.gz | vg view -dp -

But when I set the --flat-alts option, I get the expected result.

vg construct -r toy.fa -v toy_test.vcf.gz -f | vg view -dp -

So maybe we should flip the default around, as this has been confusing to several people. What do others think?

ekg commented 8 years ago

@Kitonick79 another thing, note that there are warnings about using lower-case reference sequences in the graphs. Due to the local alignment implementation these are not well-supported by vg. I don't know if it makes sense to try to support them.

Kitonick79 commented 8 years ago

?Erik, thank you for reproducing this, it was a bit confusing for us. I can't recomend on supportin of lower-case reference sequences. I have also begun answering to you in GitHub.

Best, Dmitrii

От: Erik Garrison notifications@github.com Отправлено: 1 июля 2016 г. 17:30 Кому: vgteam/vg Копия: Dmitrii Miagkov; Mention Тема: Re: [vgteam/vg] Reference genome visualization (#411)

@Kitonick79https://github.com/Kitonick79 another thing, note that there are warnings about using lower-case reference sequences in the graphs. Due to the local alignment implementation these are not well-supported by vg. I don't know if it makes sense to try to support them.

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/vgteam/vg/issues/411#issuecomment-229961990, or mute the threadhttps://github.com/notifications/unsubscribe/AHDdulgEK8kwkGvjPmSZWFt5LdshAUdAks5qRSSLgaJpZM4JDDs5.

ekg commented 8 years ago

Visualizing alignments is not trivial. Right now the default is designed for debugging mappings, while the best visualization pattern I'm aware of may have other problems. It's best to demonstrate by example. As usual, from the vg/test/ directory:

vg construct -r tiny/tiny.fa >flat.vg                                  
vg index -x flat.xg -g flat.gcsa -k 8 flat.vg                          
(echo '>flat1'; vg sim -n 1 -l 50 -e 0.05 -s 69 -x flat.xg ) >flat1.fa 
vg sim -n 1 -l 50 -e 0.05 -s 69 -x flat.xg -a >flat1.fa.gam            
vg construct -r flat1.fa >flat1.vg                                     
vg index -x flat1.xg flat1.vg                                          
vg sim -n 30 -l 50 -e 0.005 -s 7372 -x flat1.xg -a >flat1.sim          
(echo '>flat2'; vg sim -n 1 -l 50 -e 0.05 -s 77 -x flat.xg ) >flat2.fa 
vg sim -n 1 -l 50 -e 0.05 -s 77 -x flat.xg -a >flat2.fa.gam            
vg construct -r flat2.fa >flat2.vg                                     
vg index -x flat2.xg flat2.vg                                          
vg sim -n 30 -l 50 -e 0.005 -s 8675309 -x flat2.xg -a >flat2.sim       
vg map -x flat.xg -g flat.gcsa -G <(cat flat1.sim flat2.sim) >flat.gam

(This is rather complicated because it's lifted from a vg genotype test that I've developed, but it suits our objectives.)

So now we can look at the alignments in flat.gam using an extension of vg view -d (dot output).

vg view -dA flat.gam flat.vg

This screenshot is from the bottom of the rendering:

Because the input graph is only one node, we are just showing the JSON versions of the Mapping objects in the Paths of the Alignments. The visualization is buggy for several reasons. For one, it is trying to show the orientation of the reads with extra nodes with + and - on them, but this is not formatted right. Second, it's not human-readable, and is only really helpful if you're trying to understand the way the alignments work for the purposes of working on vg.

An alternative that is destructive to the node space of the graph (which frankly, we shouldn't be too attached to... it's the reference sequences that provide stable coordinates) is to embed the alignments as paths in the graph using vg mod -i (note that the alignments can only be embedded and saved if they have their names set, which happens here because we use vg sim to make the alignment records from the flat1 and flat2 graphs).

This produces a slightly more readable visualization, although it's still not ideal:

vg mod -i flat.gam flat.vg | vg view -dp -

(Again, this is not the entire rendering, just a screenshot of the part of it close to the reference graph.)

I've found these two approaches to be useful, and hopefully they give you some inspiration about ways to tackle the problem. I doubt that we will find a completely general solution to suit every use-case. However, we can probably get closer to something by following the model in IGV and focusing on DAGs or hierarchical layouts. Note that we can render cyclic graphs with dot thanks to @buske's work at the recent biohackathon, so there is no reason to be limited to DAGs; but maybe we don't need to worry about rendering cyclic graphs really nicely as they will probably be unusual in many applications.

There must be a way to render the alignments against a given node in the same way as they are in IGV or samtools tview, but then decouple the layout across edges. Maybe I can sketch up the idea, or perhaps you already have something better in mind.

ekg commented 8 years ago

thank you for the guidance on the xg index and GFA format. We will use vg protobuf instead

For the moment I think it makes sense. We can also drop into GFA easily where it is better through the conversions provided by vg view. However, we are stuck with respect to alignments. There is a mechanism to represent them in GFA but I am not sure it provides the same semantics that we need.

If you are not working in C++ then you'll need to work from the JSON version of the protobuf or write a small parser for the vg-protobuf formats.

Ideally we would use pure protobuf, but we can't because we would overflow many limits in protobuf that guard against extremely large messages. These limits are painfully low--- around 64M. To work around this, it is "standard" to implement a stream of protobuf objects prefixed by their serialized length. (For instance, https://github.com/mafintosh/pbs implements something similar).

Hopefully it is clear from the stream::write function how the stream is structured, but to be extra clear you have a gzip-compressed stream, inside of which there are a series of blocks composed of:

[ count of objects in block (varint64) ]
[ [ length of serialized protobuf object (varint32) ] [ serialized object (string) ] ] ...

A valid file can be composed of one or more chunks.

I'm open to adjusting this in any way that you see fit or might make it easier to implement a reader/writer interface in another language (I assume you'd use Java if you're targeting IGV?).

glennhickey commented 8 years ago

I like -f as default too.

On Fri, Jul 1, 2016 at 10:29 AM, Erik Garrison notifications@github.com wrote:

Thanks @Kitonick79 https://github.com/Kitonick79! You've beat me to the paste-in.

Here's the last bit of the response:

We have also encountered a problem with VCF interpretation: VG tool doesn't process correctly multiple base polymorphism.

Just to confirm, I get the same result as you do with the vg construct defaults:

vg construct -r toy.fa -v toy_test.vcf.gz | vg view -dp -

[image: image] https://cloud.githubusercontent.com/assets/145425/16524501/d20d33cc-3fa8-11e6-853e-111b9d046551.png

But when I set the --flat-alts option, I get the expected result.

vg construct -r toy.fa -v toy_test.vcf.gz -f | vg view -dp -

[image: image] https://cloud.githubusercontent.com/assets/145425/16524478/c2468060-3fa8-11e6-87a5-c7d5be04f3f4.png

So maybe we should flip the default around, as this has been confusing to several people. What do others think?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/411#issuecomment-229961721, or mute the thread https://github.com/notifications/unsubscribe/AA2_7ksk8i4iAKkm75u9dYOjLLrmtoD4ks5qRSRQgaJpZM4JDDs5 .

ekg commented 8 years ago

The only problem with a patch is that we will need to fix up lots of tests. Also the behavior with multiallelic indels is less than ideal. But maybe an improved normalization process can fix this.

On Fri, Jul 1, 2016, 16:43 Glenn Hickey notifications@github.com wrote:

I like -f as default too.

On Fri, Jul 1, 2016 at 10:29 AM, Erik Garrison notifications@github.com wrote:

Thanks @Kitonick79 https://github.com/Kitonick79! You've beat me to the paste-in.

Here's the last bit of the response:

We have also encountered a problem with VCF interpretation: VG tool doesn't process correctly multiple base polymorphism.

Just to confirm, I get the same result as you do with the vg construct defaults:

vg construct -r toy.fa -v toy_test.vcf.gz | vg view -dp -

[image: image] < https://cloud.githubusercontent.com/assets/145425/16524501/d20d33cc-3fa8-11e6-853e-111b9d046551.png

But when I set the --flat-alts option, I get the expected result.

vg construct -r toy.fa -v toy_test.vcf.gz -f | vg view -dp -

[image: image] < https://cloud.githubusercontent.com/assets/145425/16524478/c2468060-3fa8-11e6-87a5-c7d5be04f3f4.png

So maybe we should flip the default around, as this has been confusing to several people. What do others think?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/411#issuecomment-229961721, or mute the thread < https://github.com/notifications/unsubscribe/AA2_7ksk8i4iAKkm75u9dYOjLLrmtoD4ks5qRSRQgaJpZM4JDDs5

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/411#issuecomment-229980184, or mute the thread https://github.com/notifications/unsubscribe/AAI4EQnM26VYHMmYAU7Hz2OWW9mRMH-5ks5qRTW-gaJpZM4JDDs5 .

Kitonick79 commented 8 years ago

Erik Alignment looks complicated to us right now, please, provide a sketch of the idea you described. We have already implemented reader for vg data, it was rather straightforward.

mzueva commented 8 years ago

Hello, everyone! I'm working with Dmitrii @Kitonick79 on graph visualization and algorithm linearization tasks. Now I'm implementing work with index for IGV for retrieving a reference path interval from the whole graph. I followed your example for work with index:

build a "single-base" .vg from a tiny example VCF and reference this is done for exposition, as it makes the behavior of vg find clearer vg construct -r tiny/tiny.fa -v tiny/tiny.vcf.gz -m 1 >tiny.vg now we generate the xg index vg index -x tiny.xg tiny.vg finally, we get the region of the graph one step from the reference path from position 10 to 20 and visualize it with dot vg find -p x:10-20 -x tiny.xg -c 1 | vg view -dp - | dot -Tpdf -o x.pdf

Reference sequence tiny.fa looks the following: >x = CAAATAAGG CTTGGAAATT TTCTGGAGTTCTATTATATTCCAACTCTCTG . We expect, that query like x:10-20 will return the sequence in bold since usually reference is 1-base indexed.

We tried these commands and encountered several issues (we tried both single-base graph and usual graph):

The result graph interval starts with nucleotide G (its offset in the reference is 9)
The result graph doesn't end at nucleotide with index 20 (and graphs end at different nucleotide depending on -m option in construct command).
It seems, that we cannot query position inside a node. For instance, if we query tiny.vg for position 5 (which is in the first node), we'll receive the whole first node in the result graph (node won't be cut at position 5)

Could you please tell whether these results are expected? How are the nucleotides indexed and queried in vg find command? Is there a way to retrieve a sub graph, containing only a given reference interval. Maybe some command line options are missing?

ekg commented 8 years ago

@mzueva these are the expected results from these operations.

We return whole-node subgraphs from the find operations. This is done in order to maintain reference coordinates (which are defined as Positions, or tuples of node id, strand, and offset from the start of the node on the strand). The average node size is a kind of optimization to reduce the overheads introduced by the Node objects. We require a lot more space to store the graph if it is in single base form.

Other solutions are possible as an extension of this. For example, we could obtain the graph starting at exactly a given reference path position by first getting the graph as we do, and then cutting the start and end nodes where they extend past the range. Functions inside of vg's VG class allow this to be done in a few lines of code, so we could add an option to support this. We would use VG::cut_node and VG::destroy_node.

We also expand the context of the graph in the find operation in order to get the whole subgraph near the reference path at the given positions. Without context expansion we would get:

vg find -p x:10-20 -x tiny.xg | vg view -dp - | dot -Tpdf -o x.pdf

(BTW, I've just pushed a fix that allows this rendering to work correctly, so you'll want to pull that in or you'll get a funny-looking visualization.)

The point here is that we've only picked up the part of the graph that is referred to as the region x:10-20. All the other non-path nodes are missing, but mentioned by the edges that link to the nodes we do have. To fill the graph out further, we add the -c 1 context expansion option.

The index scheme for the graph is described in the paper draft in the repo. It's in section 4.1 of the attached pdf: main.pdf

The index works in roughly the way described on the following slides, which may help clarify things. They are a simplification in some respects. The model is implemented in xg using the sdsl-lite library of succinct data structures.

genome variation graphs and genotyping by tera-scale learning 2

Kitonick79 commented 8 years ago

@ekg We ran into an issue. In BRCA1 Example you sent us are nodes without outcomming joins, which are obviously not the end of an allele. Is this a bug or we don't understand something?

Kitonick79 commented 8 years ago

For example on node 4 is no outcomming joins

ekg commented 8 years ago

@Kitonick79 could you visualize the region of the graph you're referring to? Is this the BRCA1 graph made using vg msga?

ekg commented 8 years ago

@Kitonick79 this doesn't sound right but I'm trying to understand what exactly you're talking about. If this was built with vg construct there will not be nodes like that. If it was built with vg msga it's quite normal that there are some dangling heads and tails of the graph.

Kitonick79 commented 8 years ago

@ekg We have used BRCA1 graph, which Adam had sent us. I don't know if it is made by vg msga or vg construct. I will sent you gfa of the node

Kitonick79 commented 8 years ago

Dangling heads and tails are acceptable, but node with just two incomming joins and no outcomming ones looks odd

ekg commented 8 years ago

If you can post the whole thing, that'd be helpful too.

Kitonick79 commented 8 years ago

no_out_joins

It is a screenshot from our IGV plugin

Kitonick79 commented 8 years ago

Not the fourth node, but problem is clear

ekg commented 8 years ago

Are you parsing the JSON or GFA to develop this?

@adamnovak how did this graph get constructed?

Kitonick79 commented 8 years ago

No, we are parsing *.vg

ekg commented 8 years ago

@Kitonick79 ok, so this could be something odd with the input graph you have.

This won't occur if the graph is built from a VCF and reference file with vg construct, and it seems unlikely to be coming from the output of vg msga. So I wonder if the graph @adamnovak sent is constructed with cactus or another kind of assembly method?

In your visualization, is the second edge coming into the "T" node reversing?

It's seeming to me like a problem with the input graph. I can get you another one. Hopefully @adamnovak will chime in with some notes.

adamnovak commented 8 years ago

The example graphs I sent were the 1KG BRCA1 graph (made with vg construct), and the Cactus MHC (made, of course, with Cactus).

Here's the BRCA1 graph: graph.vg.zip

Here's what I can find out about node 4:

$ vg view -j graph.vg | jq '.node[] | select(.id == 4)'
{
  "sequence": "AAAA",
  "name": "17_0_53",
  "id": 4
}
$ vg view -j graph.vg | jq '.edge[] | select(.from == 4 or .to == 4)'
{
  "from": 2,
  "to": 4
}
{
  "from": 4,
  "from_start": true,
  "to": 3,
  "to_end": true
}
{
  "from": 4,
  "to": 5
}
{
  "from": 4,
  "to": 6
}

I see outgoing edges from 4 to 5 and from 4 to 6. Do you not see those edges in your copy of the graph? Or are we not talking about the same node 4?

adamnovak commented 8 years ago

Note also that the edge:

{
  "from": 4,
  "from_start": true,
  "to": 3,
  "to_end": true
}

is really an edge that connects the end of node 3 to the start of node 4, and is equivalent to:

{
  "from": 3,
  "to": 4,
}

ekg commented 8 years ago

@adamnovak It's weird that this is ending up with a doubly-reversing edge.

@Kitonick79 this is easy to adjust... you'd just run vg mod --unreverse-edges graph.vg >fix.vg.

Does it make sense that the edges that go -/- are equivalent to those that go +/+?

adamnovak commented 8 years ago

I think the edge is probably like that because we sent the graph through the reference server, and somewhere along the chain of conversions the edge got made doubly-reversing.

On Thu, Jul 14, 2016 at 10:20 AM, Erik Garrison notifications@github.com wrote:

@adamnovak https://github.com/adamnovak It's weird that this is ending up with a doubly-reversing edge.

@Kitonick79 https://github.com/Kitonick79 this is easy to adjust... you'd just run vg mod --unreverse-edges graph.vg >fix.vg.

Does it make sense that the edges that go -/- are equivalent to those that go +/+?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/411#issuecomment-232732424, or mute the thread https://github.com/notifications/unsubscribe/AE0_X90NjtoghChb9HNOtzaleYWGkgAEks5qVm_wgaJpZM4JDDs5 .

Kitonick79 commented 8 years ago

@ekg and @adamnovak, thank you for the explaination! This has helped a lot. The problem is solved

vgteam / vg

Reference genome visualization #411

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

An aside about GFA: