Investigate minority ratings

alecramsay commented 6 months ago

Understand why the "striping" happens. Make sure there's not a bug.

alecramsay commented 6 months ago

tl;dr -- by design.

Everything is working as designed. What you're seeing is the conjunction of how the metric was designed (on purpose) and the unique political & demographic geography of NC.

This the NC background. The first screenshot is the district Statistics table for the root map. You can see many districts have a noteworthy percentage of "minority" VAP people -- where "minority" means all minorities combined -- but no individual minority group (like Blacks) have a significant percentage in any districts.

The second screenshot shows that same information on the map. You see big areas with fairly significant overall minority VAP.

The way minority rating in DRA works is as follows:

The number of proportional seats for each individual minority demographic is computed as the statewide VAP % times the total number of seats. Those individual numbers are summed to get a total proportional number of "opportunity districts" where individual minority groups have the opportunity to elect a representative of their choice.
Then a somewhat analogous calculation is made for all minorities combined: the statewide VAP % is multiplied by the number of seats to get a proportional number of "coalition districts." These are called "coalition districts," because two or more minority groups have to vote in concert to elect a representative of their choice.
Then the likely number of opportunity districts is estimated, based on the district boundaries. Similarly the likely number of coalition districts is estimated, based on the district boundaries.
These estimates are convert to percentages/fractions [0-1], by dividing by the respective proportional numbers.
Then both raw measures are normalize [0-100]--so you have two [0-100] components.
The tricky part is how they are combined into a single [0-100] rating. Coalition districts are much harder to pull off, because by definition multiple minority groups have to vote together. So, that rating has a 1/2 weighting. The two weighted values are added together and capped at 100.

Sooo, what lots of maps in a state getting 50 of 100 ratings on the minority dimension means:

Individual minority demographics are spread relatively uniformly in a state, so most maps don't have any opportunity districts.
But overall minority VAP % is relatively high in several areas which generally correspond to compact districts, so many maps get the max coalition district contribution (50) ... but that's all they get.

This is the case in NC which is why you're seeing so many 50's in the minority scatter plots.

Like I said: by design.

alecramsay commented 5 months ago

Re-opened this issue, so we can flesh out what's happening better.

Here's the key function:

def est_minority_opportunity(mf: float, demo: Optional[str] = None) -> float:
    """Estimate the opportunity for a minority representation.

    NOTE - Shift minority proportions up, so 37% minority scores like 52% share,
      but use the uncompressed seat probability distribution. This makes a 37%
      district have a ~70% chance of winning, and a 50% district have a >99% chance.
      Below 37 % has no chance.
    NOTE - Sam Wang suggest 90% probability for a 37% district. That seems a little
      too abrupt and all or nothing, so I backed off to the ~70%.
    """

    assert mf >= 0.0

    range: list[float] = [0.37, 0.50]

    shift: float = 0.15  # For Black VAP % (and Minority)
    dilution: float = 0.50  # For other demos, dilute the Black shift by half
    if demo and (demo not in ["black", "minority"]):
        shift *= dilution

    wip_num: float = mf + shift
    oppty: float = 0.0 if (mf < range[0]) else min(est_seat_probability(wip_num), 1.0)

    return oppty

Note the hard break at 37%: below that VAP % the demographic has 0% opportunity; starting there it has a 70% opportunity. Above 37% a probability function is applied--the same one as for partisan lean, just shift 15 percentage points.

alecramsay commented 5 months ago

This is out of order, but this is what I wrote in email the day before re-opening this issue:

I've got some downtime and a reasonably empty backlog, so I'll address the two comments you made in Slack on Sunday:

It might be interesting to add two more highlighted points to each of the scatterplots: the DRA winners on each of the axis. Also, sorry to beat this dead horse, but I really do believe that there is something very wrong with the Minority function that it produces those big gaps.

Re: the first, I've added the DRA Notable Maps to our scatter plots.

We're (basically) building a V2 of the trade-offs site we built first. But they've been completely disjoint until now. Your comment made me realize that there were a couple sizable chunks of previous work that I could leverage on this. One was the block-assignment files (BAF) for the DRA Notable Maps that I pulled/snapshotted whenever I did for V1. The other was their associated ratings which I also pulled from DRA. There's a CSV for each state that has a list of the notable maps with each being a dict of the ratings. I'm not pulling that in and plotting those points (including the official map) on our scatter plots.

At some point, we might want to add the ability to also plot points for other maps, like you imagined with your before & after comment. I've got all the machinery now to take an arbitrary number of maps with ratings and plot them on the scatter plots. I've added a tracking issue in GitHub for this.

Re: the second, you could mean a) that you think there's a bug in the (cloned) implementation in rdatools or b) that there's a problem with its design or c) some more amorphous "I don't like what it's doing." comment or something.

I'd previously written up my thoughts in this GitHub issue, though I don't know for sure whether you read it -- https://github.com/rdatools/rdabase/issues/19. I have some newfound understanding, but I believe all of that is still accurate and on point.

Since you brought this up again, I did some more digging and can say and conjecture more: The first thing I've done is replicate the > 50 minority rating for an ensemble plan (scored outside DRA) with the rating produced inside DRA. I looked at the ratings for the 10K ReCom plans, picked one with a 64 minority rating, generated the precinct-assignment file, imported it into DRA, and cross-checked the minority rating. So, there's no bug in the cloned implementation (i.e., a) is off the table).

Then I looked at the minority ratings for all 2020-cycle congressional maps published for NC. The search query in DRA is "state:nc and cycle:2020 and plantype:congress and minorityrights:50". (Terry is a god.) This yields 632 maps, 139 (22%) of which have minority ratings == 50. This compares to 67% of the ReCom ensemble plans (10,000 - 3,288). IOW, the ReCom ensemble produces a much higher percentage of plans that only max out the coalition districts component of the minority rating but don't get on the board wrto opportunity-to-elect / single-minority-demographic districts. The takeaway here is that human mapdrawers explore the space of opportunity-to-elect districts way more than pure randomness. That hypothesis / realization led me to posit the following conjecture: automatically generated ensembles will have the narrowest range of ratings => pushed frontiers will expand those ranges => but human-drawn plans/maps will, in general, exceed even those pushed frontiers.

The DRA Notable Maps confirm this hypothesis wrto the ensemble scores -- they Notable Maps are typically way beyond the frontiers.

I haven't yet confirmed the pushed-frontier part of the hypothesis, because I can't push frontiers in bulk yet (but soon). I believe what we'll see is that the pushed frontiers expand the range of the ratings for the ensemble plans quite a bit ... but not nearly as much as the Notable Maps intentionally drawn by people.

If true, I think this will be a testament to just how big the solution space is and how effectively/narrowly knowledgeable map drawers search in it.

If true, I think there may be some interesting implications for the whole "use ensembles to evaluate proposed or actual plans" paradigm.

The bottomline is that I remain confident that there's no bug in minority ratings and that in the face of expert map makers the metric produces reasonable results.

Lots to talk about, probably, once I can push frontiers en masse and run the pipeline end to end (and run another state through it that has different minority characteristics than NC).

alecramsay commented 5 months ago

FWIW, the DRA minority rating does not limit the contribution of each district to one. IOW, it's theoretically possible that the opportunity-to-elect district and coalition district fractional seat probabilities for a district sum to more than one.

I don't know think that is very likely, but I also don't have any data to show that.

rdatools / rdabase

Investigate minority ratings #19