sknetwork-team / scikit-network

Graph Algorithms
Other
601 stars 67 forks source link

Importing a graph from Neptune #458

Closed Tokukawa closed 3 years ago

Tokukawa commented 3 years ago

Hello, this is not an issue but a question. I would like to test scikit-network with a huge graph db hosted in Amazon Neptune. I can extract a random sample and rebuild the graph by recursive edge traversing. Am I wondering if scikit-network has any better and more elegant way to import a graph from a graph db? Thanx in advance

QLutz commented 3 years ago

Hello,

We are not familiar with how such databases work. Are you looking for a convenient API to query from Neptune on-the-fly without having to explicitly load the graph beforehand? Otherwise, what functionality would you want? You can write mock-up code to give us a rough idea.

Best,

Tokukawa commented 3 years ago

Hello, it would be great having a class that manages the import, something like

db_loader = DBLoader('neptune endopoint', ...some extra needed arguments...); graph = db_loader.import(); adjacency = graph.adjacency; position = graph.position; etc... Best

QLutz commented 3 years ago

Hello again,

Looking at the Amazon Neptune docs, I find that they are quite lacking as far as Python is concerned. It seems like your safest bet is to use the method described on this page (which requires to install the gremlinpython dependency). Then, using the code bit given here, you can easily extract an edgelist which you can feed to sknetwork.data.load_edge_list. Put together, it could resemble (necessary imports aside) something like:

graph = Graph()
remoteConn = DriverRemoteConnection('ws://ENDPOINT:PORT/gremlin','g')
g = graph.traversal().withRemote(remoteConn)

edge_list =  []
for e in g.E().toList():
  edge_list.append((e.inV.id, e.outV.id))

bunch = sknetwork.data.load_edge_list(edge_list)

Granted, this only returns the adjacency, but using similar basic for loops on the iterators you get from Gremlin should be enough to get all attached information.

As far as integration in scikit-network is concerned, we would like to avoid adding unnecessary dependencies for easier maintenance. However, if you manage to create a nice notebook explaining how to query all the information of a graph from Neptune, we'll definitely consider adding it to our tutorials.

Best regards,