modflowpy / flopy

A Python package to create, run, and post-process MODFLOW-based models.
https://flopy.readthedocs.io
Other
507 stars 307 forks source link

feature: universal get_dataframe() method #1969

Open aleaf opened 11 months ago

aleaf commented 11 months ago

Is your feature request related to a problem? Please describe. Currently users can export shapefile representations of the model grid and package or variable objects. But the current export paradigm is clunky, sometimes slow, and users have little control over the output. With geopandas and other python packages, most geospatial operations that would have previously been done in a GUI or at the command line can be done in memory. To take advantage of this, Flopy users currently have to export a shapefile and read it back in, or write their own code to build a GeoDataFrame from the model grid information.

Describe the solution you'd like This PR would add a universal get_dataframe() method that would export the modelgrid, package or variable contents to a tabular pandas.DataFrame format. If geopandas were installed, the DataFrame would be a GeoDataFrame with a 'geometry' column containing shapely polygons of the model grid cell represented by each row, and a .crs passed from the model grid. This would provide an easy gateway for users to all of the geospatial functionality of geopandas.

A few more details:

Describe alternatives you've considered Existing alternatives are described above. We could call the method get_geodataframe() instead, but this would be inconsistent with a regular pandas dataframe being returned if geopandas weren't installed (I don't think we want the clutter of get_dataframe() and get_geodataframe() methods. Returning regular DataFrames without geopandas could still be advantageous in providing a tabular representation of the respective object that would then be summarized, etc.

I'm happy to work on this, but given the changes coming in #1955, I'm wondering if it wouldn't make sense to wait until that is merged.

langevin-usgs commented 11 months ago

Hey @aleaf, agreed that it is probably best to wait until #1955 is completed, which should be soon. This would be a really nice addition. Not sure of the best way to toggle between gpd and non-gpd returned dataframes, but it does seem like that should be a user-controlled option. There are still complications in some situations for installing geopandas, so keeping that isolated somehow would be good. Excited to see what you come up with.