mapbox / make-surface

Vector surfaces creation routines
MIT License
89 stars 18 forks source link

Efficient printing to stdout for fillfacets #67

Closed dnomadb closed 9 years ago

dnomadb commented 9 years ago

In order to pipe (streaming) to a tool that will asynchronously update a database, we need to print this format to stdout:

{"n0n3n1s3s2s2s1s3n0n1n2n": {"value": 1.0142107312820954}}
{"n0n3n1s3s2s2s1s3n0n1s2s": {"value": 1.330862652829468}}
{"n0n3n1s3s2s2s1s3n0n1s3n": {"value": 1.6207056185537347}}
{"n0n3n1s3s2s2s1s3n0n1s3s": {"value": 1.6780140406118687}}
{"n0n3n1s3s2s2s1s3n1n0s2n": {"value": 1.8823738029760659}}
{"n0n3n1s3s2s2s1s3n1n0s2s": {"value": 2.1052621344872242}}

That is, line separated objects. Right now, I:

  1. Perform json.dumps on each object within a list: https://github.com/mapbox/make-surface/blob/point-sampler/makesurface/scripts/fill_facets.py#L59-L65
  2. Join and print (w/ click.echo) this list with a newline char ('\n'.join(theList)): https://github.com/mapbox/make-surface/blob/point-sampler/makesurface/scripts/fill_facets.py#L129

This seems to perform fine alone, but when piped to a streaming db update script, is very. slow.

Bottom line

We need to print this line delimited set of json objects out in a way that is suitable for streaming.

cc: @ian29 @sgillies @rclark

sgillies commented 9 years ago

@dnomadb say you have an iterator over Python dicts like {"n0n3n1s3s2s2s1s3n0n1n2n": {"value": 1.0142107312820954}}... just do

for item in items:
    click.echo(json.dumps(item))

click.echo() adds a LF just like Python's print does. Don't echo the whole thing, just echo record-by-record or feature-by-feature.

dnomadb commented 9 years ago

Oh my that is so simple. Thanks @sgillies

dnomadb commented 9 years ago

One more thought @sgillies - let's say I need to do work to this same particular set of items, eg:

def doThis(thing):
    return doSomeWork(thing)

items = list(doThis(item) for item in items)

for item in items:
    click.echo(json.dumps(item))

Would doing it like this be faster overall? As in, the echo happens as each task completes, rather than per item after all are complete, eg:

for item in items:
    click.echo(json.dumps(doThis(item)))
sgillies commented 9 years ago

@dnomadb yes, that's faster. Cut out the middle (list) man!

dnomadb commented 9 years ago

This is all integrated.