Closed jeizenga closed 5 years ago
I was going to complain about iterating too but realized I hadn't noticed for_each_occurrence_on_handle()
(was scanning the interface in vg.hpp and missed the non-pure-virtual methods). That does help!!
Apart from that, I pettily don't like typing "occurrence" and I find it confusing that get_first_occurrence()
returns an occurrence_handle_t
but get_occurrence()
returns a handle_t
.
I also was unaware of for_each_occurrence_on_handle
. Plus one vote for a naming that requires fewer keystrokes too.
While we're discussing this, maybe this would also be the time to do away with swap_handles
, since we have multiple graph implementations that don't respect it now.
Instead of "occurrence" (ouch) let's call them "step". We need the circular path testing. I really don't like the multiple test for has_next_occurrence and get_next_occurrence. It seems wasteful as it's dipping into the data structures twice to do something that in most backing implementations should be done in a single operation.
On Sat, Apr 6, 2019, 01:08 Jordan Eizenga notifications@github.com wrote:
While we're discussing this, maybe this would also be the time to do away with swap_handles, since we have multiple graph implementations that don't respect it now.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/libhandlegraph/issues/3#issuecomment-480448651, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4EZRAN-pAuD9BK6ImlIjBrROBDEGcks5vd9dTgaJpZM4cfnyj .
I kind of want to go a C++ iterator route. We could provide an iterator type that lets you iterate over a whole path. But to back it we'd either need to have all the PathHandleGraph implementations provide their own derived iterator classes just for themselves, or we'd have to just build it over another API that looks more like what we have already.
The has_next_occurrence
/get_next_occurrence
split exists right now to prevent there ever being a "null" occurrence_handle_t
, for which get_occurrence
wouldn't be able to produce a real handle_t
. If we allow for that we can adopt a more C++-y design with past-the-end occurrence_handle_t
s that you can check for equality with.
So it seems like maybe we should:
occurrence
to step
throughout.get_step
to something like get_handle_at
. Specify that it can fail for some before-the-beginning or past-the-end steps.get_is_circular
for Paths.get_next_step
and get_previous_step
will loop forever around circular paths, and make them do it.get_end_step
and maybe get_rend_step
(?) for the past-the-end and before-the-beginning steps that non-circular paths will have.has_next_step
and has_previous_step
.From there we could go the C++ iterator route:
path_begin
and path_end
(and maybe path_rbegin
and path_rend
) which return traditional C++ iterators that traverse the path once, not looping around. We can have one iterator class for all the PathHandleGraph
implementations, implemented in terms of get_next_step
/get_previous_step
and encapsulating the don't-go-around-circular-paths-forever logic.for (handle_t here : graph.scan_path(path_handle)) {}
But we also already have this to enable simple looping with the same iteratee design we use elsewhere:
We could maybe give it an _impl
method for implementations to override if just walking along the next-step relationship would be slow.
It's not clear to me whether we want the C++ iterator design (since the paths are linear, and in C++ you should use iterators over linear things) or the iteratee design (to match the rest of the library).
It is true that we're dipping in twice, but that's actually pretty standard for STL-style iteration where each iteration typically calls both operator++()
and end()
. We might be able to use a past-the-last iterator in order to touch the possibly-compressed portions of the path data structure less often though.
@glennhickey @ekg @adamnovak
So the general consensus seems to be that the interface for paths could be better. I'm opening this issue so we can have a discussion about what we want that to look like.
Personally, one thing that I don't like is that you have to manually check whether the path is empty before you iterate along it. Also, we don't have a "past-the-last" type iterator, which makes the loops a little clunky in C++. Basically, I'd like a more compact "for each" loop.
Also, our mutable path interface is currently append only. Should we expand that to support prepending and insertion in the middle? What about deletion?
We also still have https://github.com/vgteam/libhandlegraph/issues/2 about the lack of support for circular paths outstanding. We should deal with as well, while we're discussing updating the interface. In particular, I think we need to think about the semantics of iterating to the "next" node on a path when the path loops around, and also to provide a means to iterate over every node one time and every edge one time.