mapbox / mapboxgl-jupyter

Use Mapbox GL JS to visualize data in a Python Jupyter notebook
MIT License
665 stars 137 forks source link

Streaming data architecture #24

Closed ryanbaumann closed 6 years ago

ryanbaumann commented 6 years ago

Investigate creating a streaming data architecture from a Pandas dataframe to a map using the approach in the Smalltalk python library https://github.com/murphy214/smalltalk.

murphy214 commented 6 years ago

There are some interesting things you could do with it i.e. things like jupyter widgets can help you build a gui for a dataset much easier than in js. Of course your paying for it in loading the dataset into memory every time.

You could also have something that just renders vector tiles from a data set (dataframe) passively but every time its updated the changes will effect the rendering, a configuration like this would give you more control over different zoom levels and rendering density. Just some thoughts!

Also I'll probably have a read_geojson method for nlgeojson to read directly from a geojson data source to a nlgeojson dataframe easily and I think it should be pretty quick as well! Currently writing that now!

ryanbaumann commented 6 years ago

@murphy214 great notes.

Also I'll probably have a read_geojson method for nlgeojson to read directly from a geojson data source to a nlgeojson dataframe easily and I think it should be pretty quick as well! Currently writing that now!

Can you share this when it's ready? Would love to incorporate this.

murphy214 commented 6 years ago

@ryanbaumann Here's the new repo, I gave an example in the read me: https://github.com/murphy214/nlgeojson

There is a chance the install could fail I tested on a vm but I'm actually using compiled golang executables to read the geojson in parallel , and I'm sure there are corner cases that aren't supported however its like 30x faster reading in and you get nlgeojson dfs in the process.

I'm essentially reading chunks and stitching geojson feature pieces back together see here so I'm sure there that if you have a field with '{' or '}' in it its gonna fail, but like I said kind of just hack right now.

EDIT I lied its not 30x faster that was a random guess heres the bench I just ran 120k linestrings and like 50 features properties.

import geopandas as gpd
import nlgeojson as nl
import time

s = time.time()
nl.read_geojson('wv.geojson')
print time.time() - s
# 10.4603159428 s

s = time.time()
gpd.read_file('wv.geojson')
print time.time() - s
# 105.153828859 s

Of course one should remember that these dataframes don't have their spatial context within them in other words geometric operations aren't ready to be done on them but they can be, and in most instances (9/10) we don't need the shapely geos implementation as its super heavy and if we are doing a spatial operation its more performant on a domain specific data structure not arbitrary features in memory (bringing them in and out of memory when needed).

ryanbaumann commented 6 years ago

@murphy214 just tested nlgeojson - it works great! A simple architecture, for now, is just to use nlgeojson to export a data frame to a local geojson file very fast, and then load the local geojson file in the mapboxgl viz.

Here's an example with 380k earthquake points as a heatmap from the USGS API:

murphy214 commented 6 years ago

@ryanbaumann Great! Glad it ended up working! It should be quite a bit faster than putting it a normal json serializer/deserializer and it also shouldn't be so ram intensive as well! I really need to build out the read functionality a little more robustly to account for corner cases of "{}" in strings like a real json parser, because I think it could pretty useful to some people.

Any ideas for incorporations with polygons and/or linestrings? You could always do some UI side filter stuff with Jupyter widgets and it should be entirely possible, however, it probably would be better to handle stuff like that natively in JS.

ryanbaumann commented 6 years ago

Agree it could be a very useful utility library @murphy214. A few things I'm thinking to add:

  1. Ability to pass a list of dataframe column names to include in an exported geojson.
  2. Remove latitude and longitude from geojson properties since they're already included in the geometry.

Re: Lines and Polygons - That's next on the agenda for this library to offer an easy Pythonic interface for analysts to create choropleths maps. As for sliders and filters - that should be easier to add as render-time filters and style changes with HTML objects. Mapboxgl is really good at changing data style each frame. Along with the new Mapboxgl expression syntax, we should be able to handle all the data updates in the viz alone after loading the data once and creating vector tiles.

murphy214 commented 6 years ago

@ryanbaumann This could be done with minimal refactoring I'll take a look at implementing it tomorrow as I'm currently off for the holidays, it shouldn't take to long! I'll keep you posted!

ryanbaumann commented 6 years ago

Modified df_to_geojson() utility function to output a line-delimited geojson file. https://github.com/mapbox/mapboxgl-jupyter/commit/4cb4ef3c1f32628525172ffeb555f3b46daaa602

Serving the exported geojson file locally using the Jupyter Notebook web server works great for all default visualizations. It can be optimized from here, but this is a solid starting point for v 1.0