uscensusbureau / recon_replication

Other
2 stars 0 forks source link

Create a script that actually reconstruct a sample Census Tract as a test and demonstration #18

Closed simsong closed 1 year ago

simsong commented 1 year ago

The current repo does not work, but we cannot test it other than doing a checkout and then a lot of manual trying-to-get-it-to-work stuff.

it would be useful to have a script that could be run when this repo is checked out on a clean VM running AWS Linux or in a directory of a Mac that does the following:

This could be used both for testing and for demonstration purposes.

rrod515 commented 1 year ago

@simsong What Gurobi license file are you expecting to use here? @tbeggs-econo and I do not have access to the academic license for testing.

simsong commented 1 year ago

There is always a Gurobi file. If it is the academic license, it has a key. At Census, it had the hostname of the token server, I believe.

rrod515 commented 1 year ago

@tbeggs-econo would any of the CFT materials you had been working on be helpful here?

johnmabowd commented 1 year ago

@rrod515 Use one of the tracts in rhdf_bt_0solvar_extract.xlsx since those records are already in the archive. Run the test on our machines, comment at the appropriate places something like: user must insert Gurobi license key here.

rrod515 commented 1 year ago

@tbeggs-econo let's use census tract 01073005103.

larsvilhuber commented 1 year ago

Some quick thoughs/priorities

Reordering:

* [ ]  Installs all necessary packages (both system and Python)

This should come first. Rephrased:

  • [ ] Checks that all software is available (Python, packages, Gurobi, MySQL) or can be accessed (MySQL)

Automation of this is optional, as implementations can vary widely:

* [ ]  Either downloads Gurobi, or asks the user to download it and put their license key in a file
* [ ]  Installs MySQL locally

Then

* [ ]  Checks to make sure that the Gurobi license file is present.

All of this can be in a separate "setup.py", it does not need to be in a "run the reconstruction" script.

Then have the second script be "run_reconstruction.py" or something, with the remainder of @simsong's steps

I would make this a simple flag in some sort of configuration file, command line option, or interactive prompt defaulting to "test county" instead, since it also affects the "Downloads the relevant files"

* [ ]  Offers to run the reconstruction on the whole county or even the state

The rest then in order

* [ ]  Creates the necessary MySQL database

* [ ]  Downloads the relevant files from the Census bureau's website

* [ ]  Loads the MySQL table

* [ ]  Runs the reconstruction on the census tract.
tbeggs-econo commented 1 year ago

So for setting up MySQL how much do we really want to do for the end user. I am not sure we should be setting up root passwords on a MySQL server and the like. The end user should probably be responsible for this. The dbrtool already will setup the MySQL tables and after you provide your MySQL credentials and desired database in the dbrecon_config.json.

I have a setup.py that can try to yum install MySQL currently.

larsvilhuber commented 1 year ago

I mentioned earlier that I think installing MySQL server is optional. For instance, yum is fine for the 10% of internet that uses RH or compatible, and will fail on everything else, and it is not expected from replication packages (or software).

I would check whether a MySQL server is accessible using provided connection information (address, login, and credentials), and whether the configured user can create tables.

tbeggs-econo commented 1 year ago

ok I am going to remove the yum install method. I think it is cleaner if we let the end user setup MySQL. To many options to account for plus . I will add a check to dbrtool.py --env to make sure credentials actually work

simsong commented 1 year ago

I concur, as they may run a MySQL server on another system.