pnevyk / gryf

Graph data structure library aspiring to be convenient, versatile, correct and performant.
MIT License
69 stars 1 forks source link

Index types #17

Closed pnevyk closed 1 year ago

pnevyk commented 2 years ago

Currently, there is a single index type for vertices (VertexIndex) and a single index type for edges (EdgeIndex). Not being able to parametrize over the index type prevents some interesting applications (specific use cases where a special form of the index is convenient to have, implicit graphs #3, resource-constrained environments). On the other hand, I don't want to clutter the generics, making the majority of uses where default index type is fine more cumbersome to work with.

As a side note, VerticesBaseWeak and EdgesBaseWeak somehow support custom indices (their main original motivation was to add support for implicit graphs), but it feels quite suboptimal.

Here is a list that of traditional solutions to this problem that I can think of:

  1. Just use a single, predetermined type. This is what gryf has now.
  2. Add another generic type (usually with a default type). In our case, for instance the generics for Graph type would be Graph<V, E, Ty: EdgeType, G, VIdx: IndexType = VertexIndex, EIdx: IndexType = VertexIndex> and for Edges would be Edges<E, Ty: EdgeType, VIdx, EIdx>, which is quite scary.
  3. A special type is created (Vertex<V, Idx = VertexIndex>, similarly for edges) and it is used where we now just use V and E respectively. I personally feel that it makes the interfaces uglier and less ergonomic.

There is one alternative idea that would use auto traits and negative impls unstable features. Trait Indexed will have two associated types VertexIndex: IndexType and EdgeIndex: IndexType. Trait DefaultIndex will be auto trait, thus it would be implemented for all types (unless negative impl is used) and there will be a catch-all

impl<T: DefaultIndex> Indexed for T {
    type VertexIndex = VertexIndex;
    type EdgeIndex = EdgeIndex;
}

Thus, the index type would be associated to the vertex and edge data types, but in an opaque way (contrary to option 2 mentioned above). This should cover the majority of cases where default index is fine without changing the code on the user side. Using a negative impl, one could opt-out from default indexing and implement Indexed trait for their type with a custom index. Two helper types will be provided for convenience: PhantomIndexed opting out from DefaultIndex auto trait (basically for the same purpose as PhantomPinned) and Custom<T, VIdx, EIdx> implementing Indexed trait while using VIdx and EIdx for VertexIndex and EdgeIndex, respectively.

I have zero experience with auto traits and negative impls and I would not be surprised if this approach causes "conflicting implementations" or any other issue. I am also not sure about the final ergonomics of this solution, especially on the user side.

A note about requiring both VertexIndex and EdgeIndex on a single type: This is necessary to be able to use a type for vertices and edges as one couldn't implement Indexed trait for both vertex index type and edge index type if there was just a single associated type. Another reason is that edge-related traits need both edge index type and vertex index type in order to support functions such as endpoints and edge_index.

Careful thinking will need to go into what should be the minimal requirements for an index type (IndexType trait). Currently it's Clone, Copy, PartialEq + Eq, PartialOrd + Ord, Hash, from and into usize, "null" value. I believe it is reasonable to require Eq + Hash (this is important for using HashSet for visit sets, among other things). Ord is required in CompactIndexMap and brings support for BTreeSet and other ordering-related data structures. Clone is likely a must-have, Copy is questionable, probably too restrictive. If Copy is not required, many function parameters types should change from owned type to ref type (e.g., contains_vertex(&self, index: &V::VertexIndex)).

Being able to get a usize representation is important for some algorithms, but having this as an additional, optional functionality that can be used in constraints, sounds as a good compromise. Similar for "null" index. Generic storage implementations are also dependent on being able to treat the indices as usize. There could be therefore two index traits, IndexType and NumIndexType: IndexType, where NumIndexType would provide these additional usize-oriented capabilities.

pnevyk commented 2 years ago

An open question is how to pass the index types into "base" traits (VerticesBase, EdgesBase and their weak counterparts). It is important to remember that the main motivation for splitting Vertices into Vertices and VerticesBase (and equivalently for edges) was to have a trait that does not depend on the data type V, which was causing necessity to use phantom data for types that do not care about V (otherwise, the compiler complains with V not constrained by impl trait or self type).

pnevyk commented 2 years ago

For convenience, a mechanism for where clause ensuring that index types of V and E are the same (e.g., where IndexCompatible<V, E>) in the same spirit as nalgebra shape constraints should be implemented. I can imagine this would be very useful for code that requires traits using both V and E.