mdbartos / pysheds

:earth_americas: Simple and fast watershed delineation in python.
GNU General Public License v3.0
722 stars 196 forks source link

Stream order to vector/GeoJSON (like `extract_river_network`)? #259

Closed JamesSample closed 1 week ago

JamesSample commented 1 week ago

I am using branches = grid.extract_river_network(fdir, acc > 100, apply_output_mask=False) to convert my stream network to GeoJSON. This works nicely and returns the stream segments that I'm interested in.

I have also used order = grid.stream_order(fdir, facc > facc_thresh) to get the Strahler stream order. This works too, but returns a raster object. Is there a straightforward way to convert this to vector too? I essentially want the same stream segments as returned by extract_river_network, but labelled in the attribute table with the stream order. Is this possible?

I have tried saving the stream order raster and extracting features from it using rasterio, but this returns the separate pixels as polygons. Ideally, I'd like lines that are compatible with the result from extract_river_network.

It feels like this shouldn't be too difficult, but I haven't managed so far. Any tips appreciated!

Thank you!.

JamesSample commented 1 week ago

Not the neatest solution, but I got this working in the end.

I copied the code from extract_river_network (here) and changed it to return the profiles object instead of converting to vector within the function (I basically commented out lines 1443 to 1450).

In my own code, I then added this "hacky" function:

def get_stream_segment_stats(profiles, fdir, values_grid):
    """Extract summary statistics from 'values_grid' for each stream
    segment in 'profiles'.

    Args
        profiles: List. Stream segment profiles returned by the modified version
            of sGrid.extract_river_network.
        fdir: Obj. Pysheds flow direction grid used to calculate 'profiles'.
        values_grid: Obj. Pysheds raster object with values to summarise. Must
            have the same shape, CRS etc. as 'fdir'.

    Returns
        Geodataframe of stream segments with summary statistics.

    Raises
        ValueError if 'fdir' and 'values_grid' have different shapes or projections.
    """
    if fdir.crs != values_grid.crs:
        raise ValueError("'fdir' and 'values_grid' must have the same CRS.")

    if fdir.shape != values_grid.shape:
        raise ValueError("'fdir' and 'values_grid' must have the same shape.")

    feature_list = []
    for index, profile in enumerate(profiles):
        yi, xi = np.unravel_index(list(profile), fdir.shape)
        x, y = View.affine_transform(values_grid.affine, xi, yi)
        line = geojson.LineString(np.column_stack([x, y]).tolist())

        # Extract raster values for the profile
        prof_values = values_grid[yi, xi].astype(float)
        prof_values[prof_values == values_grid.nodata] = np.nan

        # Summary stats
        prof_min = float(np.nanmin(prof_values))
        prof_mean = float(np.nanmean(prof_values))
        prof_max = float(np.nanmax(prof_values))
        prof_med = float(np.nanmedian(prof_values))
        prof_std = float(np.nanstd(prof_values))
        prof_count = np.count_nonzero(~np.isnan(prof_values))

        # Append the feature with attributes to the feature list
        feature_list.append(
            geojson.Feature(
                geometry=line,
                id=index,
                properties={
                    "count": prof_count,
                    "min": prof_min,
                    "mean": prof_mean,
                    "max": prof_max,
                    "median": prof_med,
                    "std": prof_std,
                },
            )
        )
    geo = geojson.FeatureCollection(feature_list)
    gdf = gpd.GeoDataFrame.from_features(geo["features"], crs=fdir.crs)

    return gdf

This does the same as the vector conversion at the end of the original extract_river_network function, but it also calculates summary statistics for each profile based on values in another grid.

For example:

# Use the modified pysheds function to get the "raw" profiles list
profiles = grid.extract_river_network(
    fdir, facc > facc_thresh, dirmap=(1, 2, 3, 4, 5, 6, 7, 8), apply_output_mask=False
)

# Calculate stream order
order = grid.stream_order(fdir, facc > facc_thresh, dirmap=(1, 2, 3, 4, 5, 6, 7, 8))

# Convert to vector, including stream order stats
order_gdf = get_stream_segment_stats(profiles, fdir, order)

# Also get elevation stats for stream segments
elev_gdf = get_stream_segment_stats(profiles, fdir, dem)

A bit messy, but it's solved my immediately problem and allows me to get summary stats for each profile line in a consistent way.

Better solutions welcome!

Thank you :-)