goostengine / goost

A general-purpose, extensible and customizable C++ extension for Godot Engine.
https://goostengine.github.io/
MIT License
481 stars 18 forks source link

Add `Graph` class #172

Closed Xrayez closed 2 years ago

Xrayez commented 2 years ago

Closes godotengine/godot-proposals#3848.

Intended to implement a mixed graph data structure. Allows to add both associative (undirected) and directed edges, with multiple edges between a pair of vertices, and self-loops.

func _ready():
    var graph = Graph.new()

    var a: GraphVertex = graph.add_vertex("a")
    var b: GraphVertex = graph.add_vertex("b")
    var c: GraphVertex = graph.add_vertex("c")

    var ab: GraphEdge = graph.add_edge(a, b)
    var ac: GraphEdge = graph.add_edge(a, c)
    var bc: GraphEdge = graph.add_edge(b, c)

    print(graph.has_edge(a, b)) # Prints True
    print(graph.has_edge(b, a)) # Prints True

(annotations are optional)

To-do

sairam4123 commented 2 years ago

Is it possible to inherit the classes and get the class instead of the GraphVertex and GraphEdge? RoadIntersection inherits GrahVertex RoadSegment inherits GraphEdge maybe when the graph is being created with _init() maybe I could pass the two classes and use it in graph, is it possible?

Xrayez commented 2 years ago

Is it possible to inherit the classes and get the class instead of the GraphVertex and GraphEdge?

In theory this is possible, and yeah I think it makes sense to add.

When you call add_edge(), you could pass your own GraphVertex rather than letting the method to construct a new vertex pair from values. However, add_edge would still return base GraphEdge since it needs to create something.

The current workaround would be to assign your own RoadIntersection and RoadSegment (that extend whatever) as a value of GraphVertex.value and GraphEdge.value.

The idea here is that a Graph should take ownership over those objects, even if they are created externally. However, you can also call GraphVertex.free() and GraphEdge.free(), and the Graph will still be able to update the structure, so there's no way to break something and Graph will cleanup everything upon destruction.

maybe when the graph is being created with _init() maybe I could pass the two classes and use it in graph, is it possible?

There's no way to override _init() in C++, unfortunately. See some issues in godotengine/godot-proposals#1522.

Since you'd like to have your own versions of GraphVertex and GraphEdge, how about extending a Graph as well? I'm thinking that we could add _create_vertex() and _create_edge() as virtual methods that you could override for your own purposes:

extends Graph

func _create_vertex() -> GraphVertex:
    return RoadIntersection.new()

func _create_edge() -> GraphEdge:
    return RoadSegment.new()

If we look at AStar, there will likely be other virtual methods to be implemented.

Zylann commented 2 years ago

Some performance notes on the implementation. I'm not particularly going to use this, but I figured I'd post here since it's a bit interesting.

I see you use List in your functions, but why? Linked lists perform a lot of memory allocations and take more space than vectors, and there are so many ways to just use a vector or deque instead that I have yet to find a common use case for them, even though I made a graph class myself (for a DAG, in a node-based tool). If the only reason is because of being able to remove in the middle of iteration cheaply, you can also do that in vectors by swapping with the last element (order is rarely a concern, but if it is I guess you have to shift items, but even with lists you have to iterate items anyways to find those you want, see remove_if). If lists are reaaaally needed here, I would at least suggest using a pool allocator in the future maybe.

The same goes for vertices and edges, they are heap-allocated full-fledged Objects, when I would probably have made them optional (as metadata?), so by default they are lightweight and maybe not heap-allocated (note, Maps already heap-allocate their elements internally even if they are just pointers), but if you want to go for OOP and let users access vertices as objects or even inherit vertex classes with scripts, it might be a better option, though will be less efficient with large graphs.

In other areas like remove_vertex you also use Set but I'm not sure why. Is this because you expect edges to be found multiple times? If you have a way to find all edges connected to a vertex then there would be no chance to find twice the same one, which means no need for a Set and a temporary vector can be used (here I only iterate through vectors https://github.com/Zylann/godot_voxel/blob/d1da685faccbeb17a2bb2b2209d68368b83f7b87/generators/graph/program_graph.cpp#L58). There is an ordered-removal from a list of connections, but even if List was used instead it would have had to iterate it either way).

Why use Map in graph data, instead of HashMap? HashMap lookup is faster. Also, the fact you use a pointer as key means querying a vertex will require such object and even if you delete and recreate, the same key might appear. For example that means you can't have IDs that remain unique after alterations of the data, and serializing the data might need extra work as you'll need to remap pointers.

Note about temporary data structures: if a frequently used algorithm needs a vector temporarily, a cheap trick to speed it up is:

static thread_local std::vector<Type> s_data;

The Godot equivalent would be static thread_local LocalVector s_data. The idea is that memory is re-used between runs so if the function has to be called often then allocations won't cost anything after a short while.

Xrayez commented 2 years ago

If the only reason is because of being able to remove in the middle of iteration cheaply, you can also do that in vectors by swapping with the last element

I guess it depends on whether sparse graphs will be used. I'm less experienced with vector tricks... 😢

but if you want to go for OOP and let users access vertices as objects or even inherit vertex classes with scripts, it might be a better option, though will be less efficient with large graphs.

Yeah, flexibility, ease-of-use have higher priority here. It would also make sense to have different kinds of graphs implemented for undirected, directed, multigraph etc. for even better performance given concrete assumptions, but I'd just like to have something usable at first and not to rely on and hack AStar class for everything in Godot. 🙂

This also makes it easier to use hierarchical/nested graphs.

For large, dense graphs, perhaps we could implement LargeGraph at some point.

In other areas like remove_vertex you also use Set but I'm not sure why. Is this because you expect edges to be found multiple times? If you have a way to find all edges connected to a vertex then there would be no chance to find twice the same one, which means no need for a Set and a temporary vector can be used

Note that GraphEdge is an actual object here, not just a relationship. When add_edge(a, b) gets called, it instantiates a GraphEdge object, which is inserted into adjacency list of the a and b vertices. The same applies with directed edges for get_successors() and get_predecessors() to work. So indeed, it's very likely the same edge is going to be present in several lists. I haven't come up how to solve this more efficiently yet.

Why use Map in graph data, instead of HashMap? HashMap lookup is faster.

Map was easier to use and I was already familiar with it. I'll likely replace it with HashMap eventually.

Also, the fact you use a pointer as key means querying a vertex will require such object and even if you delete and recreate, the same key might appear. For example that means you can't have IDs that remain unique after alterations of the data, and serializing the data might need extra work as you'll need to remap pointers.

Wouldn't this be the case if IDs are used as well? I mean, you must have a key anyway. What do you mean by "same key might appear" problem? Note that even if you attempt to free the vertex/edge object, those will request the graph to erase keys properly and won't remain dangling.

But yeah, I'll likely make Graph extend Resource eventually. Importing from other formats like GraphML could also be a possibility maybe...

The Godot equivalent would be static thread_local LocalVector s_data

Thanks, I'll see how and when this may speed up things.

A lot of this is a learning experience for me, however current implementation seems to work okay-ish so far.

A general-purpose Graph structure could easily become hairy, however I do realize that certain optimizations could be done without hurting usability. My aim is certainly not to replicate graphs from Boost library (at least for now). That would require heavy metaprogramming which is not compatible with how Godot works, not mentioning my skillset... 😛

Zylann commented 2 years ago

Wouldn't this be the case if IDs are used as well? I mean, you must have a key anyway. What do you mean by "same key might appear" problem? Note that even if you attempt to free the vertex/edge object, those will request the graph to erase keys properly and won't remain dangling.

You don't control the value of pointers, but you can control IDs. It can either be auto increment (which will be unique for as long as you need) or even user-defined if that makes some things easier. It is also easier to serialize (to file or network), while serializing a key will be harder if they are objects because the pointer won't carry over to the file or network. Most likely the user will have to maintain a map themselves. About the same key appearing twice, I think it's only a matter in some particular C++ cases because the pointer might come up again later for a different object (unless you never delete them, think for Undo/Redo for example), thats not going to be noticeable in GDScript actually.

Xrayez commented 2 years ago

About the same key appearing twice, I think it's only a matter in some particular C++ cases because the pointer might come up again later for a different object (unless you never delete them, think for Undo/Redo for example), thats not going to be noticeable in GDScript actually.

Yes, that's why I raised the question, because it's an implementation detail as of now. But as you say, introducing IDs would be needed for serialization purposes, so I'm likely going to refactor the implementation to enable easier serialization.

About the undo/redo part, this marginally reminded me of #162 where I keep the history of draw commands. Perhaps using IDs could help with visualizing/debugging algorithms. 🤔

Xrayez commented 2 years ago

I've done some major refactor to use HashMap. I've split the data into vertices and edges. I've also figured that neighbors can be kept track of during insertion. Some methods simplified down to one-liners. The code is 50 lines shorter now. I'll need to move get_neighbors() and family to GraphVertex class itself.

Still using lists for edges, but I really wonder what kind of performance impact this has. I'll leave this for the final touch and/or when I feel motivated enough to do this sort of optimization with vectors... 😭

Unfortunately, I'm not sure whether I've noticed major performance improvements. But I think the current state is certainly an improvement structurally. I've only referred to AStar implementation in intellectual crisis, but mostly this is me reinventing the wheel for the sake of learning, and hopefully this resulted in usable API that I wouldn't be able to come up if I were to copy-paste AStar structures blindlessly. 😄

Again, for reference, I took inspiration from https://networkx.org/ API.

Xrayez commented 2 years ago

Added a way to create custom scripted graph vertices and edges.

extends Node2D

func _ready():
    var road = Road.new()

    var a = road.add_intersection(Vector2(0, 0))
    var b = road.add_intersection(Vector2(100, 100))
    var s = road.add_segment(a, b)

    for x in road.get_intersections():
        print("Intersection: ", x.value)

    assert(a is RoadIntersection)
    assert(s is RoadSegment)

    print(a.has_lights())
    print(s.is_slippery())

As you see, you can completely hide the entire Graph interface if you really need/want to (for readability).

Example project graph_scripted.zip

sairam4123 commented 2 years ago

So now, how will you differentiate multiple edges? Like if edge_a starts at a and ends at b, and another edge let's say edge_b starts at a and ends at b, how will you differentiate both of them? I have introduced segment_id that's generated like this

func get_id(min_vec: Vector3):
    if id != -1:
        return id
    var from_bit_left_shift = 18
    var seg_type_left_shift = 6 + from_bit_left_shift
    var custom_left_shift = 12 + seg_type_left_shift
    var modder_left_shift = 10 + custom_left_shift

    id = int(((modder_id << modder_left_shift)
                | (custom_id << custom_left_shift)
                | (seg_type << seg_type_left_shift)
                | (start_position.intersection.get_id(min_vec) << from_bit_left_shift)
                | (end_position.intersection.get_id(min_vec))))
Xrayez commented 2 years ago

Since edges are actual objects here, they are always unique, even if they are linked to the same vertices. You can fetch all edges with get_edge_list():

var edges_between_ab: Array = graph.get_edge_list(a, b)

It's also up to you to define whether you allow multiple edges between vertices, even for undirected edges (because they can have different weights, for instance).

The ids are generated internally for the purpose of computing the hash for storing and probably graph data serialization at some point. I think we could expose the internal ID via get_id() (both for vertices and edges), but I'm not sure whether it's actually needed with current structure.

Looking at your snippet, I figured that there's actually no function for searching vertices by values in my implementation. I'll add find_vertex(value).

Xrayez commented 2 years ago

Updates so far:

var V = graph.iterator  # GraphIteratorDFS
V.initialize(root)  # Some arbitrary vertex in an existing Graph

while V.has_next():
    var n = V.next()

You could use find_connected_component() for that purpose as well and iterate the returned array instead of course, but if you have to search for specific value/solution in a large graph, that would be overkill (especially when searched value/solution can be found early). So in that case, you'd use a GraphIterator.

sairam4123 commented 2 years ago

You can also implement the AStar algorithm if possible. Also where can I download this graph version? Found it haha.

sairam4123 commented 2 years ago

btw, could implement a way so we can pass extra info to the constructor of our own classes. Like: I want to pass the network info of the road intersection when creating it, so when I create a vertex using the graph system.

graph.add_vertex(value, bind_args)

and then the value of the bind args could be passed like this.

func _create_vertex(bind_args):
    return RoadIntersection.new(bind_args["position"], bind_args["road_info"])

it could also be a array, just pass in whatever you get in the bind args haha.

Add a way to fetch the internal ids too, would be really helpful! Just wanted to ask, is the ID of the vertex based on the position or based on the index of the vertex list? it'd be nice if it were based on the position instead.

Also how do you differentiate the multi-edged graph with ids (not with objects)?

Xrayez commented 2 years ago

You can also implement the AStar algorithm if possible.

I think this could be added by implementing shortest_path() method. But since AStar already solves this, this is low priority, especially when you can use existing shortest_path_tree() method to solve a variety of path finding problems (see documentation and the snippet there on how to extract any shortest path. What characterizes A is the presence of heuristic function and explicit target vertex (from, to), while Dijkstra works like breadth-first search (bucket-fill), but allows to find all shortest* paths from source to any other vertex.

pass extra info to the constructor of our own classes.

I think we have some misunderstanding here with virtual callback vs signal callback. The point of those virtual callbacks is to override vertex/edge instantiation. Since you're extending Graph already and have your own class for GraphVertex. you can add wanted functions (just as an example):

class_name RoadIntersection extends GraphVertex

func _init(p_position, p_road_net):
    value = {}
    value.road_net = p_road_net
    value.position = p_position

But when you call graph.add_vertex(value), it will store the value in that vertex already.

Notice that you can use value property to initialize anything: Array, Dictionary, Vector2 etc. If you do this, you'll be able to automatically serialize the graph.

From what I can tell, it appears to me that you want to have the road_net (graph?) to be accessible from any other vertex. In that case, I think it makes sense to add GraphVertex::get_graph() indeed.

Add a way to fetch the internal ids too, would be really helpful!

What for? Can you show where having access to internal IDs would be helpful?

Just wanted to ask, is the ID of the vertex based on the position or based on the index of the vertex list? it'd be nice if it were based on the position instead.

IDs are generated starting from 1 and incremented on every vertex creation. How do you define position in this context? If you say that position is Vector2(), then it's a concrete concept. The Graph class is agnostic to what kind of data you'd like to use (not necessarily geometric meaning), apart from the weights used by methods such as minimum_spanning_tree() or shortest_path_tree() which interpret value of GraphEdges as a float.

Also how do you differentiate the multi-edged graph with ids (not with objects)?

Is comparing objects impossible in your case? I'm asking because current architecture is object-oriented, and mostly everything was designed that way in mind. I'd just like to understand what you struggle with.

See also my previous post: https://github.com/goostengine/goost/pull/172#issuecomment-1026174538

sairam4123 commented 2 years ago
class_name RoadIntersection extends GraphVertex

func _init(p_position, p_road_net):
    value = {}
    value.road_net = p_road_net
    value.position = p_position

how would you pass the p_road_net and p_position?

From what I can tell, it appears to me that you want to have the road_net (graph?) to be accessible from any other vertex. In that case, I think it makes sense to add GraphVertex::get_graph() indeed.

I'm sorry, it was road_info (RoadNetworkInfo (for rendering system and ai stuff)).

Is comparing objects impossible in your case? I'm asking because current architecture is object-oriented, and mostly everything was designed that way in mind. I'd just like to understand what you struggle with. It is possible, but I think intergers could be much faster, yeah, I understand that the current architecture is object-oriented, but I think it would be nice to use IDs to keep it similar to AStar resource.

My Graph actually uses ID based system, and ids are generated based on the position, there's a get_id method that returns the id for the specific intersection, as you see my old snippet, I also have a segment_id generator that does some advanced stuff so I can pass the id to the shader and the shader could get back all the info it needs. But the main goal for the implementation of my segment_id stuff was to differentiate two segments/edges.

Xrayez commented 2 years ago

how would you pass the p_road_net and p_position?

Here's another way:

# Add
var intersection = graph.add_vertex({id=10, position = Vector2(x, y), road_net = info})
# Access
var id = intersection.value.id
var position: Vector2 = intersection.value.position
var road_net: Dictionary = intersection.value.road_net

You can bypass all of this and simply use Graph as a data structure. You can define your own ID system this way if that's how your current system works. The question is whether you really need to use a general-purpose Graph here, as stated in https://github.com/godotengine/godot-proposals/issues/3848#issuecomment-1019508463. If you do feel like you'd benefit from using a general-purpose implementation, I think it just takes time to adapt your current architecture.

The fact that the internal implementation uses IDs does not mean that you should rely on it or even assume that it exists.

It is possible, but I think intergers could be much faster, yeah, I understand that the current architecture is object-oriented, but I think it would be nice to use IDs to keep it similar to AStar resource.

I think that actually calling get_id() twice in GDScript would be slower over comparing objects (which are mostly pointers), and the difference is really negligible (if at all), therefore not worth to be optimizing (in exchange for flexibility). See for instance performance research in godotengine/godot#43343.

I thought about using IDs as well initially, but see https://github.com/godotengine/godot-proposals/issues/3848#issuecomment-1020696105. Otherwise I'd be using AStar.

So again, the priority here is flexibility due to needed generality. Yet it's fast enough.

Xrayez commented 2 years ago

I think the core implementation is there. I'd like to merge this to let other people to try it out, and let https://github.com/goostengine/goost-fuzzer find the bugs that perhaps I couldn't find myself. There will likely be some compatibility changes.

See example project at https://github.com/goostengine/goost-examples/tree/gd3/core/graph.

Thanks for feedback!

sairam4123 commented 2 years ago

Could you please add ability to fetch all edges a vertex is connected? Something like GraphVertex.get_neighbouring_edges()

Xrayez commented 2 years ago

@sairam4123 makes sense, but I'll probably need to refactor how get_successors() and get_predecessors() work, because those are not that efficient compared to get_neighbors(). That said, I'd add:

But do note that it's possible to do this via script at the moment.

sairam4123 commented 2 years ago

Thanks! I was thinking about it too, it should be really helpful for me! get_inward_edges() and get_outward_edges() seems good but, I might recommend get_incoming_edges() and get_outgoing_edges(). Your choice though! 😄