vgteam / libhandlegraph

Library for the Handle Graph abstraction
MIT License
21 stars 5 forks source link

What should the path handle interface looks like? #3

Closed jeizenga closed 5 years ago

jeizenga commented 5 years ago

@glennhickey @ekg @adamnovak

So the general consensus seems to be that the interface for paths could be better. I'm opening this issue so we can have a discussion about what we want that to look like.

Personally, one thing that I don't like is that you have to manually check whether the path is empty before you iterate along it. Also, we don't have a "past-the-last" type iterator, which makes the loops a little clunky in C++. Basically, I'd like a more compact "for each" loop.

Also, our mutable path interface is currently append only. Should we expand that to support prepending and insertion in the middle? What about deletion?

We also still have https://github.com/vgteam/libhandlegraph/issues/2 about the lack of support for circular paths outstanding. We should deal with as well, while we're discussing updating the interface. In particular, I think we need to think about the semantics of iterating to the "next" node on a path when the path loops around, and also to provide a means to iterate over every node one time and every edge one time.

glennhickey commented 5 years ago

I was going to complain about iterating too but realized I hadn't noticed for_each_occurrence_on_handle() (was scanning the interface in vg.hpp and missed the non-pure-virtual methods). That does help!!

Apart from that, I pettily don't like typing "occurrence" and I find it confusing that get_first_occurrence() returns an occurrence_handle_t but get_occurrence() returns a handle_t.

jeizenga commented 5 years ago

I also was unaware of for_each_occurrence_on_handle. Plus one vote for a naming that requires fewer keystrokes too.

jeizenga commented 5 years ago

While we're discussing this, maybe this would also be the time to do away with swap_handles, since we have multiple graph implementations that don't respect it now.

ekg commented 5 years ago

Instead of "occurrence" (ouch) let's call them "step". We need the circular path testing. I really don't like the multiple test for has_next_occurrence and get_next_occurrence. It seems wasteful as it's dipping into the data structures twice to do something that in most backing implementations should be done in a single operation.

On Sat, Apr 6, 2019, 01:08 Jordan Eizenga notifications@github.com wrote:

While we're discussing this, maybe this would also be the time to do away with swap_handles, since we have multiple graph implementations that don't respect it now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/libhandlegraph/issues/3#issuecomment-480448651, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4EZRAN-pAuD9BK6ImlIjBrROBDEGcks5vd9dTgaJpZM4cfnyj .

adamnovak commented 5 years ago

I kind of want to go a C++ iterator route. We could provide an iterator type that lets you iterate over a whole path. But to back it we'd either need to have all the PathHandleGraph implementations provide their own derived iterator classes just for themselves, or we'd have to just build it over another API that looks more like what we have already.

The has_next_occurrence/get_next_occurrence split exists right now to prevent there ever being a "null" occurrence_handle_t, for which get_occurrence wouldn't be able to produce a real handle_t. If we allow for that we can adopt a more C++-y design with past-the-end occurrence_handle_ts that you can check for equality with.

So it seems like maybe we should:

From there we could go the C++ iterator route:

But we also already have this to enable simple looping with the same iteratee design we use elsewhere:

https://github.com/vgteam/libhandlegraph/blob/04688588e4ec840f18af74395d446e868fc13c12/src/include/handlegraph/path_handle_graph.hpp#L116-L120

We could maybe give it an _impl method for implementations to override if just walking along the next-step relationship would be slow.

It's not clear to me whether we want the C++ iterator design (since the paths are linear, and in C++ you should use iterators over linear things) or the iteratee design (to match the rest of the library).

jeizenga commented 5 years ago

It is true that we're dipping in twice, but that's actually pretty standard for STL-style iteration where each iteration typically calls both operator++() and end(). We might be able to use a past-the-last iterator in order to touch the possibly-compressed portions of the path data structure less often though.