pysal / libpysal

Core components of Python Spatial Analysis Library
http://pysal.org/libpysal
Other
259 stars 78 forks source link

Support for missing data and pandas dataframes #305

Open sjsrey opened 11 years ago

sjsrey commented 11 years ago

Original author: ada...@ucsc.edu (January 31, 2013 20:49:46)

It would be very useful for pysal to recognize and handle NaN values in NumPy arrays and/or pandas dataframes. Sometimes, it is not desirable to simply drop all observations with missing data, as these observations can be important when calculating spatial lags.

Related, it would also be helpful to use pandas indexing to align the spatial weights matrix or matrices with the variables. Again, this is primarily an issue because of missing data.

Thanks!

Original issue: http://code.google.com/p/pysal/issues/detail?id=239

sjsrey commented 11 years ago

From jsseab...@gmail.com on February 01, 2013 13:47:24 To the pysal devs, we have this support already in statsmodels. We really need to find some time to think about combining the libraries, or at least making it so you can leverage our general "framework" code and we can provide support for the spatial weights in statsmodels to use all of your work. It doesn't make much sense for us to solve all of the same problems twice.

sjsrey commented 11 years ago

From sjsrey on February 01, 2013 15:58:05 Totally agreed. Do you have any time Friday mornings 9mst which is when we have dev meetings via google hangouts. If so we could dedicate an upcoming one to start on the discussions.

sjsrey commented 11 years ago

From jsseab...@gmail.com on February 05, 2013 22:12:40 I'm pretty thin until March most likely but would be interested to set something up then. I had a good look through a decent amount of pysal over the summer after we spoke to see where we could combine. I have some thoughts on this but not a lot of time to devote to it at the moment (busy dissertating and chasing a few measly dollars).

sjsrey commented 10 years ago

This might be something to link with the geopdandas threads.

darribas commented 10 years ago

Agreed, ideally we'd like to offload this kind of operations to a pandas-like library.

sjsrey commented 8 years ago

I thin this is largely treated in @ljwolf gsoc?

andy-esch commented 3 years ago

For what it's worth, I wrote my own version of spatial lags that take into account nans with the following logic:

  1. If a target geog has all nan neighbor vals, the lag is nan
  2. If a target geog has some nan neighbors, take the np.nanmean of those neighbors
  3. If a target geog has all non-nan neighbors, take the normal mean (which is same as np.nanmean in this case)

I also have a case for adding a fill value (e.g., replace nan with 0 or whatever), although this is probably better as a custom post-processing step.

Screen Shot 2021-01-12 at 10 36 47 AM

If this would be a useful feature for others, I'm happy to get started contributing this logic into lag_spatial so users don't get lags that are all nans for cases where sparse matrices don't have any rows without nan vals.

Note: The screenshot above is actually filling null cells with the nan-lag instead of choropleth of strict spatial lag.

sjsrey commented 3 years ago

This would be a good enhancement.