uscensusbureau / recon_replication

Other
2 stars 0 forks source link

Some code in replication repo does not run #6

Closed simsong closed 1 year ago

simsong commented 1 year ago

Is the code in the replication repo supposed to be able to run? If so:

(base) simsong@Seasons recon % grep dfxml *py */*py                                                                              (slg-dev)recon_replication
dbrecon.py:from dfxml.writer import DFXMLWriter
dbrecon.py:global dfxml_writer
dbrecon.py:dfxml_writer = None
dbrecon.py:    global dfxml_writer
dbrecon.py:    dfxml_writer    = DFXMLWriter(filename=logfname.replace(".log",".dfxml"), prettyprint=True)
dbrecon.py:    dfxml_handler   = dfxml_writer.logHandler()
dbrecon.py:    logger.addHandler(dfxml_handler)
dbrecon.py:def add_dfxml_tag(tag,text=None,attrs={}):
dbrecon.py:    e = ET.SubElement(dfxml_writer.doc, tag, attrs)
dbrtool.py:    REQUIRED_FILES = glob.glob("*.py") + glob.glob("ctools/*.py") + glob.glob("dfxml/*py")
s3_pandas_synth_lp_files.py:            dbrecon.add_dfxml_tag('synth_lp_files','', {'success':'1'})
s3_pandas_synth_lp_files.py:    assert dbrecon.dfxml_writer is not None
scheduler.py:from dfxml.writer import DFXMLWriter
tests/test_logger.py:doc = ET.Element('dfxml')

image

rrod515 commented 1 year ago

Missed dfxml when we pulled in submodules; have pushed it and set you as reviewer. Will remove remote.cgi (don't think we ever use it).

Tagging @tbeggs-econo for the MySQL questions. We have pointed the user to the MySQL documentation; if we want step-by-step instructions for getting MySQL working on AWS outside of GovCloud, that is something we'll need to add.

simsong commented 1 year ago

remote.cgi is the dashboard that shows you how far the reconstruction has progressed. It was certainly useful to be able to pull up this information on my cell phone during the database reconstruction. How did you monitor it?

I don't think that we need to tell people how to install MySQL. We need to tell people how the server needs to be set up, and what it is used for. Specifically, we need to tell people:

rrod515 commented 1 year ago

I used the --status option in dbrtool.py for monitoring.

@tbeggs-econo can you add in the information above?

simsong commented 1 year ago

—status is fine! Be sure to document it.


Sent from my phone.

On Apr 30, 2023, at 1:24 PM, Rolando Rodríguez @.***> wrote:



I used the --status option in dbrtool.py for monitoring.

@tbeggs-econohttps://github.com/tbeggs-econo can you add in the information above?

— Reply to this email directly, view it on GitHubhttps://github.com/uscensusbureau/recon_replication/issues/6#issuecomment-1529086211, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAMFHLE6XHUVTJWZP2KSC33XD2N3XANCNFSM6AAAAAAXRA63NU. You are receiving this because you authored the thread.Message ID: @.***>

rrod515 commented 1 year ago

@tbeggs-econo Can you verify that config.ini is still used (my understanding is that it is) and add to the README the parts of it that need updating?

tbeggs-econo commented 1 year ago

I sorry coffee not hitting me yet. Yes. Still in use. The JSON is to replace solely the items we used to have to pull from a secrets file. So the json is items them must fill out to run. Vs the config.ini in theory can just use the defaults if they would like. There might be some overlaps I can try to clean up.

simsong commented 1 year ago

I do not believe config.ini is used anywhere that we still run. I think it might be referenced in an outdated test. I'll get rid of it.

I think that you are wrong here. It's used in drbtool.py for drbtool_config. My quick review shows that:

rrod515 commented 1 year ago

Yes we need config.ini, but the user shouldn't have to edit it unless they need to turn down the parallelism.

tbeggs-econo commented 1 year ago

Yes sorry was logged into wrong repo when my mind was switching to work mode. I think I was making the edit to the comment at the same time you left yours. (see edit)

rrod515 commented 1 year ago

Leaving this open in case @tbeggs-econo or @simsong want to do more testing, but steps 0 -2 work fine for me. Verified that step1 needs --latin1 to work properly with original SF1 files.

tbeggs-econo commented 1 year ago

Spun up an ubuntu vm to test and was able to get recon running up to step 3 where I started running into memory allocation and Gurobi limitations. (Both would be expected if running on a small machine like I spun up).