Closed sgillies closed 8 years ago
@perrygeo @dnomadb Here's my sketch of the interface:
def compute_valid_data_mask(window, nodata, mask_function, **kwargs):
"""Given a dataset window, nodata value, a mask function and its keyword args
computes a valid data mask (array).
Returns the dataset window and deflated array bytes.
Returning the input window enables async processing.
Bytes are easy to serialize, that's why we return bytes and not ndarray.
These bytes can be turned back into ndarrays by the caller."""
# Runs in the context of a global `src` dataset
mask = mask_function(src.read(window, boundless=True), nodata, **kwargs))
# chug chug chug
return (window, zlib.compress(mask))
Yes? No?
We might want to add a func
argument to pass in the function that actually is applied to each window. That makes the compute_valid_data_mask
function nothing more than a simple wrapper.
@perrygeo :+1:
See https://github.com/mapbox/nodata/blob/master/nodata/scripts/alpha.py#L25
Interface updated above :arrow_up:
And here's how you use the return values:
window, data = compute_valid_data_mask(...)
arr = numpy.fromstring(zlib.decompress(data), 'uint8').reshape(
rasterio.window_shape(window))
If we're using this with rio-mucho
the current design specifies that the function applied to each window, returns the data to be written directly to disk. See https://github.com/mapbox/rio-mucho/blob/master/riomucho/__init__.py#L104
The implication is that compute_valid_data_mask
would need to return the stacked RGBA data, unzipped to be used as a mucho run function. This might be something we could address at the mucho level?
Closure: we're not using rio-mucho right now so the questions above are no longer relevant.
Here's the interface, finalized:
def masking_function(arr, nodata, **kwargs):
"""Return an all-valid mask of the same shape and type as the
given array"""
class NodataPoolMan:
"""Nodata processing pool manager
This class encapsulates the execution of nodata algorithms on
windows of a dataset.
"""
def mask(self, windows, **kwargs):
"""Iterate over windows and compute mask arrays.
The keyword arguments will be passed as keyword arguments to the
manager's mask algorithm function.
Yields window, ndarray pairs.
"""
@perrygeo @dnomadb the internal algorithms need an interface so they can be plugged into the "windower". Interface to be decided, but keep it in mind when experimenting.