mapbox / nodata

Because the pixels you can't see are harder than the ones you can.
MIT License
1 stars 3 forks source link

Develop an interface for internal nodata-alpha algorithm #9

Closed sgillies closed 8 years ago

sgillies commented 8 years ago

@perrygeo @dnomadb the internal algorithms need an interface so they can be plugged into the "windower". Interface to be decided, but keep it in mind when experimenting.

sgillies commented 8 years ago

@perrygeo @dnomadb Here's my sketch of the interface:

def compute_valid_data_mask(window, nodata, mask_function, **kwargs):
    """Given a dataset window, nodata value, a mask function and its keyword args 
    computes a valid data mask (array).

    Returns the dataset window and deflated array bytes.

    Returning the input window enables async processing.

    Bytes are easy to serialize, that's why we return bytes and not ndarray.

    These bytes can be turned back into ndarrays by the caller."""
    # Runs in the context of a global `src` dataset
    mask = mask_function(src.read(window, boundless=True), nodata, **kwargs))
    # chug chug chug
    return (window, zlib.compress(mask))

Yes? No?

perrygeo commented 8 years ago

We might want to add a func argument to pass in the function that actually is applied to each window. That makes the compute_valid_data_mask function nothing more than a simple wrapper.

sgillies commented 8 years ago

@perrygeo :+1:

See https://github.com/mapbox/nodata/blob/master/nodata/scripts/alpha.py#L25

Interface updated above :arrow_up:

sgillies commented 8 years ago

And here's how you use the return values:

window, data = compute_valid_data_mask(...)
arr = numpy.fromstring(zlib.decompress(data), 'uint8').reshape(
    rasterio.window_shape(window))
perrygeo commented 8 years ago

If we're using this with rio-mucho the current design specifies that the function applied to each window, returns the data to be written directly to disk. See https://github.com/mapbox/rio-mucho/blob/master/riomucho/__init__.py#L104

The implication is that compute_valid_data_mask would need to return the stacked RGBA data, unzipped to be used as a mucho run function. This might be something we could address at the mucho level?

sgillies commented 8 years ago

Closure: we're not using rio-mucho right now so the questions above are no longer relevant.

Here's the interface, finalized:

def masking_function(arr, nodata, **kwargs):
    """Return an all-valid mask of the same shape and type as the
    given array"""

class NodataPoolMan:
    """Nodata processing pool manager

    This class encapsulates the execution of nodata algorithms on
    windows of a dataset.
    """
    def mask(self, windows, **kwargs):
        """Iterate over windows and compute mask arrays.

        The keyword arguments will be passed as keyword arguments to the 
        manager's mask algorithm function.

        Yields window, ndarray pairs.
        """