Forceatlas2_layout: slow? #727

Open Minyall opened 5 years ago

Minyall commented 5 years ago


Hi! Really interested in using Datashader to deal with large scale data-vis. I've been trying to go through the Networks part of the user guide and I wanted to use my own data. I have a dataset of 1,184,684 nodes and 1,210,193 edges. I've reshaped my original data so that it is two separate nodes and edges DataFrames, with the edges df providing source and target of the relevant nodes indexes in the nodes df.

Circular layout worked fine and produced a result within a few seconds. However...

force_directed = forceatlas2_layout(nodes, edges, id='id', source='source',target='target')

...has been running for about an hour with the process taking about 280% CPU and is yet to complete. I understand the mechanics of the force atlas layout are more complex than circular but I wondered if this amount of processing time is to be expected, and/or if there is a way to speed it up.

Thanks for all your efforts on this package. It's a great project.

jbednar commented 5 years ago

More complex is an understatement! The force directed algorithm is very compute intensive. It's probably possible to speed it up, but for now I'd try it on a subset of your problem and try to see how it scales with problem size.

Minyall commented 5 years ago

Ok thanks. As long as this is expected that is fine. I've moved my script to our university cluster computer to speed things up. Does the force directed function benefit from multiple cores?

Many thanks for your quick response.

jbednar commented 5 years ago

The force-directed code can probably be updated relatively easily to use Numba's parallel for loops for supporting multiple cores; see https://github.com/pyviz/datashader/blob/master/datashader/layout.py . I don't think that support was available from Numba when that code was first written. And of course Dask can be used to distributed the code across cluster nodes, but I haven't looked into the details of the algorithm to know how difficult that would be. PRs welcome! :-)

Minyall commented 5 years ago

If anyone is interested I achieved quite good speedup by using Holoviews along with an independent implementation of Forceatlas 2. It is designed with a networkx style interface so can be slotted straight into the Holoviews Graph.from_networkx method where you would normally put a networkx layout function.

hv.Graph.from_networkx(G, forceatlas2.forceatlas2_networkx_layout).opts(tools=['hover'])

You can get the implementation here