pysal / mapclassify

Classification schemes for choropleth mapping.
https://pysal.org/mapclassify
BSD 3-Clause "New" or "Revised" License
139 stars 32 forks source link

WIP classify to rgba #211

Closed knaaptime closed 3 months ago

knaaptime commented 4 months ago

this adds a function to generate an array of colors from an array of values. I've been toying with pydeck lately, which, basically, requires a column of colors on the df. Looking a bit closer, that's how most (all?) of the viz libraries work, so this is a shot at refactoring geopandas's logic for generating colors.

I can add tests, but wanted to open first to see if folks find this valuable and if it should live here

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 91.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.6%. Comparing base (af76ab3) to head (0f64631).

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/pysal/mapclassify/pull/211/graphs/tree.svg?width=650&height=150&src=pr&token=QJnoJwxLuo&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal)](https://app.codecov.io/gh/pysal/mapclassify/pull/211?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal) ```diff @@ Coverage Diff @@ ## main #211 +/- ## ===================================== Coverage 89.6% 89.6% ===================================== Files 8 9 +1 Lines 1101 1125 +24 ===================================== + Hits 986 1008 +22 - Misses 115 117 +2 ``` | [Files](https://app.codecov.io/gh/pysal/mapclassify/pull/211?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal) | Coverage Δ | | |---|---|---| | [mapclassify/\_\_init\_\_.py](https://app.codecov.io/gh/pysal/mapclassify/pull/211?src=pr&el=tree&filepath=mapclassify%2F__init__.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal#diff-bWFwY2xhc3NpZnkvX19pbml0X18ucHk=) | `100.0% <100.0%> (ø)` | | | [mapclassify/util.py](https://app.codecov.io/gh/pysal/mapclassify/pull/211?src=pr&el=tree&filepath=mapclassify%2Futil.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal#diff-bWFwY2xhc3NpZnkvdXRpbC5weQ==) | `91.3% <91.3%> (ø)` | |
martinfleis commented 4 months ago

I am think whether this should not be a class or a method on MapClassifier (probably the latter). Because to use it with a legend, you will normally be interested in other information apart from the array of colours.

knaaptime commented 4 months ago

i was curious about that, but it seemed like classify was doing the heavy lifting in geopandas, even with the legend?

probably no real reason we couldn't do both. The difference is a classifier class doesn't require a camp (edit: though, I guess if its a method it just consumes the cmap)

knaaptime commented 4 months ago

ah. but classify gives back the fitted classifier whereas this gives back the array.

Agree this makes more sense as a method, will move around

knaaptime commented 4 months ago

nevermind. If this gets pushed inside the class, there's nowhere to include the nan logic

I am think whether this should not be a class or a method on MapClassifier (probably the latter). Because to use it with a legend, you will normally be interested in other information apart from the array of colours.

so I think I would just create a classifier and the rgbas if you need both at the moment

martinfleis commented 4 months ago

That is a bit unfortunate to do the classification twice. Especially given some of them can be quite costly. Maybe include NaN tracking in the class? Like your legit_indices here?

knaaptime commented 4 months ago

yeah, agree. I think the nan-handle logic is only like those three lines. Without touching any of the classification code, we could maybe sneak it into the first step of the binning function so nans are ignored from the outset?

sjsrey commented 4 months ago

this adds a function to generate an array of colors from an array of values. I've been toying with pydeck lately, which, basically, requires a column of colors on the df. Looking a bit closer, that's how most (all?) of the viz libraries work, so this is a shot at refactoring geopandas's logic for generating colors.

I can add tests, but wanted to open first to see if folks find this valuable and if it should live here

This is definitely useful functionality.

I'm not sure that mc is the place for this, for a couple of reasons. First, it makes matplotlib a hard dependency. Second, it seems to consume mc, rather than extend mc , for a particular use case.

I can see a few options:

Keep in it mc, but move it into a utility module that makes matplotlib a soft dependency.

Keep it as suggested with the hard dependency

Don't include it in mc but have it be part of a downstream package.

Other?

knaaptime commented 4 months ago

yeah agree. I'm happy to stick it downstream, but when @martinfleis and I discussed briefly on the last pysal call, we thought it might be useful more broadly (e.g. for geopandas to consume, since its coloring logic is currently a bit arcane).

would be easy to make matplotlib a module-level import if we want to go that route

martinfleis commented 3 months ago

I've discussed this with @sjsrey in Basel and came to a conclusion that it won't help geopandas at this point. We are planning on refactoring (or at least having a deep dive) of the plotting code for 1.1 or 1.2 though, so it may become more useful then. The issue is that we will always have mapclassify as optional dependency only and when no binning is applied we need to deal with missing values anyway.

knaaptime commented 3 months ago

cool. nice work on 1.0 btw!

i still think this has value here, e.g. for legendgram and such (would also give a logical place to live alongside #173), but lmk what you think. Happy to put elsewhere if you prefer

martinfleis commented 3 months ago

I am perfectly fine with keeping it here as I can see it can be useful. Just wanted to close the discussion about usage of this in geopandas.

knaaptime commented 3 months ago

I vote to put this into a util module with mc as a soft dep. It's not central to classification, but often useful alongside classification.

So it's really useful to keep around with the same group of tools, but shouldn't induce a new dependency