holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.3k stars 365 forks source link

Forceatlas2_layout: slow? #727

Open Minyall opened 5 years ago

Minyall commented 5 years ago

Description

Hi! Really interested in using Datashader to deal with large scale data-vis. I've been trying to go through the Networks part of the user guide and I wanted to use my own data. I have a dataset of 1,184,684 nodes and 1,210,193 edges. I've reshaped my original data so that it is two separate nodes and edges DataFrames, with the edges df providing source and target of the relevant nodes indexes in the nodes df.

Circular layout worked fine and produced a result within a few seconds. However...

force_directed = forceatlas2_layout(nodes, edges, id='id', source='source',target='target')

...has been running for about an hour with the process taking about 280% CPU and is yet to complete. I understand the mechanics of the force atlas layout are more complex than circular but I wondered if this amount of processing time is to be expected, and/or if there is a way to speed it up.

Thanks for all your efforts on this package. It's a great project.

Your environment


  Model Name:   Mac Pro
  Model Identifier: MacPro6,1
  Processor Name:   Quad-Core Intel Xeon E5
  Processor Speed:  3.7 GHz
  Number of Processors: 1
  Total Number of Cores:    4
  L2 Cache (per Core):  256 KB
  L3 Cache: 10 MB
  Memory:   12 GB

Conda Info


    active env location : /Users/James/anaconda3/envs/community_mapper
            shell level : 2
       user config file : /Users/James/.condarc
 populated config files : /Users/James/.condarc
          conda version : 4.6.8
    conda-build version : 3.17.6
         python version : 3.7.1.final.0
       base environment : /Users/James/anaconda3  (writable)
           channel URLs : https://conda.anaconda.org/conda-forge/osx-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/osx-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/James/anaconda3/pkgs
                          /Users/James/.conda/pkgs
       envs directories : /Users/James/anaconda3/envs
                          /Users/James/.conda/envs
               platform : osx-64
             user-agent : conda/4.6.8 requests/2.21.0 CPython/3.7.1 Darwin/18.2.0 OSX/10.14.3
                UID:GID : 501:20
             netrc file : None
           offline mode : False```

Conda list

```# packages in environment at /Users/James/anaconda3/envs/community_mapper:
#
# Name                    Version                   Build  Channel
appnope                   0.1.0                 py36_1000    conda-forge
asn1crypto                0.24.0                py36_1003    conda-forge
attrs                     19.1.0                     py_0    conda-forge
backcall                  0.1.0                      py_0    conda-forge
blas                      1.0                         mkl    anaconda
bleach                    3.1.0                      py_0    conda-forge
blinker                   1.4                        py_1    conda-forge
bokeh                     1.0.4                 py36_1000    conda-forge
bzip2                     1.0.6             h1de35cc_1002    conda-forge
ca-certificates           2019.3.9             hecc5488_0    conda-forge
certifi                   2019.3.9                 py36_0    conda-forge
cffi                      1.12.2           py36h2d6ddff_1    conda-forge
chardet                   3.0.4                 py36_1003    conda-forge
click                     7.0                        py_0    conda-forge
cloudpickle               0.8.0                      py_0    conda-forge
colorcet                  1.0.0                      py_0    conda-forge
cryptography              2.6.1            py36hc2b1221_0    conda-forge
cycler                    0.10.0                     py_1    conda-forge
cytoolz                   0.9.0.1         py36h1de35cc_1001    conda-forge
dask                      1.1.4                      py_0    conda-forge
dask-core                 1.1.4                      py_0    conda-forge
datashader                0.6.9                      py_0    conda-forge
datashape                 0.5.4                      py_1    conda-forge
decorator                 4.4.0                      py_0    conda-forge
defusedxml                0.5.0                      py_1    conda-forge
distributed               1.26.0                   py36_1    conda-forge
entrypoints               0.3                   py36_1000    conda-forge
freetype                  2.10.0               h24853df_0    conda-forge
heapdict                  1.0.0                 py36_1000    conda-forge
idna                      2.8                   py36_1000    conda-forge
imageio                   2.5.0                    py36_0    conda-forge
intel-openmp              2019.3                      199    anaconda
ipykernel                 5.1.0           py36h24bf2e0_1002    conda-forge
ipython                   7.3.0            py36h24bf2e0_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
jedi                      0.13.3                   py36_0    conda-forge
jinja2                    2.10                       py_1    conda-forge
jpeg                      9c                h1de35cc_1001    conda-forge
jsonschema                3.0.1                    py36_0    conda-forge
jupyter_client            5.2.4                      py_3    conda-forge
jupyter_core              4.4.0                      py_0    conda-forge
jupyterlab                0.35.4                   py36_0    conda-forge
jupyterlab_server         0.2.0                      py_0    conda-forge
kiwisolver                1.0.1           py36h04f5b5a_1002    conda-forge
libcxx                    4.0.1                h579ed51_0
libcxxabi                 4.0.1                hebd6815_0    conda-forge
libffi                    3.2.1             h6de7cb9_1006    conda-forge
libgfortran               3.0.1                h93005f0_2    anaconda
libpng                    1.6.36            ha441bb4_1000    conda-forge
libsodium                 1.0.16            h1de35cc_1001    conda-forge
libtiff                   4.0.10            h79f4b77_1001    conda-forge
llvmlite                  0.26.0          py36h3fea490_1000    conda-forge
locket                    0.2.0                      py_2    conda-forge
markupsafe                1.1.1            py36h1de35cc_0    conda-forge
matplotlib-base           3.0.3            py36hf043ca5_0    conda-forge
mistune                   0.8.4           py36h1de35cc_1000    conda-forge
mkl                       2019.3                      199    anaconda
mkl_fft                   1.0.10           py36h5e564d8_0    anaconda
mkl_random                1.0.2            py36h27c97d8_0    anaconda
msgpack-python            0.6.1            py36h04f5b5a_0    conda-forge
multipledispatch          0.6.0                      py_0    conda-forge
nbconvert                 5.4.1                      py_2    conda-forge
nbformat                  4.4.0                      py_1    conda-forge
ncurses                   6.1               h0a44026_1002    conda-forge
networkx                  2.2                        py_1    conda-forge
notebook                  5.7.6                    py36_0    conda-forge
numba                     0.41.0          py36h1702cab_1000    conda-forge
numpy                     1.16.2           py36hacdab7b_0    anaconda
numpy-base                1.16.2           py36h6575580_0    anaconda
oauthlib                  3.0.1                      py_0    conda-forge
olefile                   0.46                       py_0    conda-forge
openssl                   1.1.1b               h01d97ff_2    conda-forge
packaging                 19.0                       py_0    conda-forge
pandas                    0.24.2           py36h0a44026_0    anaconda
pandoc                    2.7.1                         0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
param                     1.8.2                      py_0    conda-forge
parso                     0.3.4                      py_0    conda-forge
partd                     0.3.9                      py_0    conda-forge
pexpect                   4.6.0                 py36_1000    conda-forge
pickleshare               0.7.5                 py36_1000    conda-forge
pillow                    5.4.1           py36hbddbef0_1000    conda-forge
pip                       19.0.3                   py36_0    conda-forge
prometheus_client         0.6.0                      py_0    conda-forge
prompt_toolkit            2.0.9                      py_0    conda-forge
psutil                    5.6.1            py36h1de35cc_0    conda-forge
ptyprocess                0.6.0                 py36_1000    conda-forge
pycparser                 2.19                     py36_1    conda-forge
pyct                      0.4.6                      py_0    conda-forge
pyct-core                 0.4.6                      py_0    conda-forge
pygments                  2.3.1                      py_0    conda-forge
pyjwt                     1.7.1                      py_0    conda-forge
pymongo                   3.7.2            py36h0a44026_0    conda-forge
pyopenssl                 19.0.0                   py36_0    conda-forge
pyparsing                 2.3.1                      py_0    conda-forge
pyrsistent                0.14.11          py36h1de35cc_0    conda-forge
pysocks                   1.6.8                 py36_1002    conda-forge
python                    3.6.7             h8dc6b48_1004    conda-forge
python-dateutil           2.8.0                      py_0    conda-forge
pytz                      2018.9                   py36_0    anaconda
pywavelets                1.0.2            py36h917ab60_0    conda-forge
pyyaml                    5.1              py36h1de35cc_0    conda-forge
pyzmq                     18.0.1           py36h4cc6ddd_0    conda-forge
readline                  7.0               hcfe32e1_1001    conda-forge
requests                  2.21.0                py36_1000    conda-forge
requests-oauthlib         1.2.0                      py_0    conda-forge
scikit-image              0.14.2           py36h0a44026_1    conda-forge
scipy                     1.2.1            py36h1410ff5_0
send2trash                1.5.0                      py_0    conda-forge
setuptools                40.8.0                   py36_0    conda-forge
six                       1.12.0                py36_1000    conda-forge
sortedcontainers          2.1.0                      py_0    conda-forge
sqlite                    3.26.0            h1765d9f_1001    conda-forge
tblib                     1.3.2                      py_1    conda-forge
terminado                 0.8.1                 py36_1001    conda-forge
testpath                  0.4.2                   py_1001    conda-forge
tk                        8.6.9             ha441bb4_1000    conda-forge
toolz                     0.9.0                      py_1    conda-forge
tornado                   6.0.1            py36h1de35cc_0    conda-forge
traitlets                 4.3.2                 py36_1000    conda-forge
tweepy                    3.6.0                    py36_0    conda-forge
urllib3                   1.24.1                py36_1000    conda-forge
wcwidth                   0.1.7                      py_1    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.33.1                   py36_0    conda-forge
xarray                    0.12.0                     py_0    conda-forge
xz                        5.2.4             h1de35cc_1001    conda-forge
yaml                      0.1.7             h1de35cc_1001    conda-forge
zeromq                    4.2.5             h0a44026_1006    conda-forge
zict                      0.1.4                      py_0    conda-forge
zlib                      1.2.11            h1de35cc_1004    conda-forge```
jbednar commented 5 years ago

More complex is an understatement! The force directed algorithm is very compute intensive. It's probably possible to speed it up, but for now I'd try it on a subset of your problem and try to see how it scales with problem size.

Minyall commented 5 years ago

Ok thanks. As long as this is expected that is fine. I've moved my script to our university cluster computer to speed things up. Does the force directed function benefit from multiple cores?

Many thanks for your quick response.

jbednar commented 5 years ago

The force-directed code can probably be updated relatively easily to use Numba's parallel for loops for supporting multiple cores; see https://github.com/pyviz/datashader/blob/master/datashader/layout.py . I don't think that support was available from Numba when that code was first written. And of course Dask can be used to distributed the code across cluster nodes, but I haven't looked into the details of the algorithm to know how difficult that would be. PRs welcome! :-)

Minyall commented 5 years ago

If anyone is interested I achieved quite good speedup by using Holoviews along with an independent implementation of Forceatlas 2. It is designed with a networkx style interface so can be slotted straight into the Holoviews Graph.from_networkx method where you would normally put a networkx layout function.

hv.Graph.from_networkx(G, forceatlas2.forceatlas2_networkx_layout).opts(tools=['hover'])

You can get the implementation here