pysal / mapclassify

Classification schemes for choropleth mapping.
https://pysal.org/mapclassify
BSD 3-Clause "New" or "Revised" License
131 stars 31 forks source link

Consider handling NANs #212

Open sjsrey opened 1 month ago

sjsrey commented 1 month ago

yeah, agree. I think the nan-handle logic is only like those three lines. Without touching any of the classification code, we could maybe sneak it into the first step of the binning function so nans are ignored from the outset?

Originally posted by @knaaptime in https://github.com/pysal/mapclassify/issues/211#issuecomment-2112833967

sjsrey commented 1 month ago

As the current philosophy in mapclassify is to assume away NANS, geopandas is doing the heavy lifting on dealing with the NANS for choropleths.

I've been exploring some approaches to handling NANS in mapclassify - it isn't as simple as I initially thought, but certainly possible. Doing so fully would require discussions with @martinfleis in order to keep in sync with geopandas.

So this issue is a channel to flesh out the thinking on whether we should do this in mapclassify, or not.

knaaptime commented 1 month ago

i started looking at swapping in numpy nan_operators (e.g. nanmean instead of nan) to see about making the classifiers agnostic to the NaNs but decided that would probably be more trouble than it's worth. Probably best to let the classifiers operate, conceptually, on 'pure arrays', then just use pandas indices to keep track of where those real observations live, then reinsert on the other side.

The idea would be that if a classifier is given an array with nans, then the resulting y and yb attributes would also include nans in the appropriate places, but the classifier would ignore them when assigning bins

if we went that route, I think it would (a) not induce any breaking behavior here in mc and (b) could probably drop-in over at geopandas?)

martinfleis commented 1 month ago

I'll have to take a dive into our plotting code to get a better understanding of how it could help geopandas. It's been a while since I touched that module.