vgteam / libhandlegraph

Library for the Handle Graph abstraction
MIT License
21 stars 5 forks source link

Path improvification #5

Closed jeizenga closed 5 years ago

jeizenga commented 5 years ago

This PR resolves all of our oustanding issues! (https://github.com/vgteam/libhandlegraph/issues/2, https://github.com/vgteam/libhandlegraph/issues/3, https://github.com/vgteam/libhandlegraph/issues/4)

Specifically, the changes included are:

One thing that we discussed doing that I did not do is add reverse iterators. I can also only guarantee that this compiles. It might not work once we link against it, but we need to put something up here to program against before I can really test it.

Any comments before this API becomes VG law? @ekg @glennhickey @adamnovak

glennhickey commented 5 years ago

Looks good to me. I like the scan_path interface. Would also be happy to one day scan nodes, edges, paths in the graph etc. in this way.

On Wed, Apr 10, 2019 at 10:20 PM Jordan Eizenga notifications@github.com wrote:

This PR resolves all of our oustanding issues! (#2 https://github.com/vgteam/libhandlegraph/issues/2, #3 https://github.com/vgteam/libhandlegraph/issues/3, #4 https://github.com/vgteam/libhandlegraph/issues/4)

Specifically, the changes included are:

  • Removed swap_handles and apply_ordering.
  • Make MutablePathDeletableHandleGraph actually inherit from MutablePathMutableHandleGraph.
  • Changed documentation to indicate that MutablePathDeletableHandleGraphs are expected to delete paths when calling clear().
  • Changed "occurrence" terminology to "step".
  • New methods added to PathHandleGraph to check path circularity and in MutablePathHandleGraph to set path circularity.
  • Added a past-the-last iterator concept for step_handle_t, to be returned by path_end() and by get_next_step() when the argument is the final step.
  • Updated documentation indicating that circular paths should loop around with get_next_step and get_previous_step.
  • A class PathForEachSocket that can socket from a path_handle_t to a C++ foreach loop over handle_t.

One thing that we discussed doing that I did not do is add reverse iterators. I can also only guarantee that this compiles. It might not work once we link against it, but we need to put something up here to program against before I can really test it.

Any comments before this API becomes VG law? @ekg https://github.com/ekg @glennhickey https://github.com/glennhickey @adamnovak https://github.com/adamnovak

You can view, comment on, or merge this pull request online at:

https://github.com/vgteam/libhandlegraph/pull/5 Commit Summary

  • switch occurrence to step and add circular paths
  • add a socket into a for each loop for a path
  • switch scan path to handle_t instead of step_handle_t

File Changes

Patch Links:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/libhandlegraph/pull/5, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2_7gRsLOKWwQlnWSQ4DRU9ZSwYbDp7ks5vfpvngaJpZM4comiy .

jeizenga commented 5 years ago

Okay, charging ahead with this. Updates likely coming as I integrate it into VG and figure out how it's fundamentally broken.

ekg commented 5 years ago

As sorting is an important aspect of compression in these succinct graph representations, I think we should leave it and let it be a no-op in the case of models that don't use it. I'm going to open a PR to move it back in.

I understand removing the swap_handles function, because this was some kind of low-level interface that was used in the application of ordering, but never partially exposed outside of a full reordering.

If we don't do this, then we'll need to add a new "Sortable" feature to every level of the class hierarchy. That seems excessive to me. If we don't have sorting visible via the handlegraph interface, then it may not be possible to use the API to drive tests that we're interested in doing. That's fine, but weird.

jeizenga commented 5 years ago

@ekg and I also just discussed over email the possibility of having a generic method that "finishes" a dynamic graph to improve its performance, without being very particular about what that means semantically. I have a method right now in PackedGraph that does something similar by optimizing the in-memory layout.

adamnovak commented 5 years ago

I think that makes more sense than just a sort. Some implementations might want an opportunity to e.g. coalesce multiple immutable layers or something.

On 5/3/19, Jordan Eizenga notifications@github.com wrote:

@ekg and I also just discussed over email the possibility of having a generic method that "finishes" a dynamic graph to improve its performance, without being very particular about what that means semantically. I have a method right now in PackedGraph that does something similar by optimizing the in-memory layout.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/vgteam/libhandlegraph/pull/5#issuecomment-489148378

jeizenga commented 5 years ago

Following the discussion in the meeting on Monday, I'm gonna propose this interface:

\*
 * Adjust the representation of the graph in memory to improve performance. Optionally,
 * allow the node IDs to be reassigned to further improve performance. Ideally, this method
 * is called one time once there is expected to be few graph modifications in the future.
 *\
void optimize(bool allow_id_reassignment = true);

Does that look okay to people? I'm going to start working on it, so now would be the time to register any concerns. One thing I'm not including yet is a method to translate between the modified ID spaces, but my thought is that we're mostly assuming that people will only allow ID reassignment if the node IDs are not particularly meaningful in and of themselves anyway.