graphistry / pygraphistry

PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
BSD 3-Clause "New" or "Revised" License
2.16k stars 206 forks source link

[ENH] Much faster compressed uploads using new REST API features #188

Open lmeyerov opened 4 years ago

lmeyerov commented 4 years ago

Especially in distributed settings, a bit of compression can go a long way for faster uploads:

Easy wins

The current REST API supports compression at several layers:

Trickier wins

Interface

Unclear what the defaults + user overrides should be --

Default:

Override:

Ex:

graphistry.register(server='nginx')
g.plot() # no compression
g.edges(small_df).plot() # no compression
g.edges(big_arr).plot() # auto-compress
graphistry.register(transfer_encoding='gzip', gzip_opts={...})
g = g.settings(transfer_type='parquet')
g.edges(small_arr).plot(parquet_opts={...})

Another thought is:

g.plot(compression='auto' | True | False | None)

Or somewhere inbetween..

Prioritization

References

lmeyerov commented 3 years ago

Partially addressed via https://github.com/graphistry/pygraphistry/pull/195 : Avoid reuploads with api=3 + .plot(as_files=True)