mggg / GerryChain

Use MCMC to analyze districting plans and gerrymanders
https://mggg.github.io/GerryChain/
Other
132 stars 74 forks source link

Save a partition as an assignment csv (& load) #277

Open maxhully opened 5 years ago

maxhully commented 5 years ago

An assignment CSV has a column of node indices and a column with the corresponding district assignments. This CSV format seems to be common in the redistricting world. JSON probably isn't the right tool for the job since it only supports string keys.

zschutzman commented 5 years ago

to_csv() is pretty straightforward. Something like this should be mostly what we want. It can be put anywhere, but the reference to self.parts would need to change.

In assignment.py:

...
import csv
...

class Assignment
...

def to_csv(filename=None):
    if filename == None:
        print("Please give a filename")
        return

    with open(filename,'w') as _outfile:
        _writer = csv.writer(_outfile)
        _writer.writerow(["unit_idx", "dist_no"]
        for row in sorted([(unit,dist) for dist in self.parts.keys() for unit in self.parts.keys()[unit]]:
            _writer.writerow(row)

from_csv() could look something like this. The following will take a csv and turn it into a dictionary of the format that Assignment.from_dict() wants: each key is a geographical unit and its value is the district it's assigned to. This can be a class method of Assignment or be put somewhere else, like in a utilities.py kind of file.

...

import csv
import collections

...

# checks if a string is really just an integer wearing a fun hat
def tryint(string):
   if not isinstance(string,str): return string
   return int(string) if string.isdigit() else string

def from_csv(filename, header = True, sep = ','):
    try:
        with open(filename, 'r') as _infile:
            _reader = csv.reader(_infile, delimiter = sep)
            # throw out the header
            if header: next(_reader)

            # dictionary comprehension over the rows
            new_assn = {tryint(row[0]), tryint(row[1]) for row in _reader}

        return new_assn

    except:
        print("Could not read {}".format(filename))
        return

This will return an {int:int} dictionary if possible, otherwise an {int:string}, {string:int}, or {string:string} dictionary, to be passed to Assignment.to_dict(). This will let people rename their districts and graph's nodes to GEOIDs or something human-readable if they want. The tryint() thing is because csvs are read in as strings always, so we need to do some sort of check to see if the names can be cast to integers.

There is currently no check that the graph's node's names align with the entries in the dictionary's values in Assignment.from_dict(), or that the values in this dictionary are unique, which may cause some safety issues if we're letting users bring in assignments from sources other than the .shp or .json containing the graph.

maxhully commented 5 years ago

You're right about the lack of a nodes check --- I ran into that issue a couple days ago. I'll make an issue.

carlschroedl commented 3 years ago

👍 for this -- ward/block assignment CSV is the only valid format for submitting proposals to the Wisconsin state legislature. Format details are here in the section "TECHNICAL SPECIFICATIONS (IF USING AN ALTERNATE TECHNOLOGY)".

I'm convening a redistricting workshop Sept 20th. The WI state legislature accepts maps up until Oct 15th.

Is there any chance the CSV export feature could be added before either of those times?