ual / ual_model_workspace

Repo for building a clean UrbanSim model for the Bay Area
BSD 3-Clause "New" or "Revised" License
2 stars 4 forks source link

fit method for OLSRegressionStep needs update to get_data #27

Closed waddell closed 6 years ago

waddell commented 6 years ago

Right now OLSRegressionStep is using get_data in its fit method, which fails with this message: UnboundLocalError: local variable 'expr_cols' referenced before assignment

shared.py indicates that get_data is deprecated and should use get_df instead, but that throws a different error if I change get_data to get_df.

I think the root of the problem is in parsing the model_expression into columns , and urbansim.models.utils.py is expecting formula, not model_expression.

You can replicate this problem by running this notebook: https://github.com/ual/urbansim_parcel_bayarea/blob/master/notebooks/REPM/REPM%20owner.ipynb

smmaurer commented 6 years ago

I think the estimation code in that notebook is coming from the very first template demo (in urbansim/parcel_template_sandbox), and the the API has changed a bit since then. If you replace the model setup code with this, it will run:

m = OLSRegressionStep()
m.tables = ['buildings', 'parcels']
m.model_expression = 'np.log1p(res_price_per_sqft) ~ year_built'
m.fit()

Or equivalently:

m = OLSRegressionStep(tables=['buildings','parcels'], 
                      model_expression='np.log1p(res_price_per_sqft) ~ year_built')
m.fit()

The notebooks in ual/urbansim_parcel_bayarea are a better source for example code -- they should all either be up to date or clearly marked as archival. I'll mark the other repo as archival too so people don't get confused.

get_data() was the original helper method I wrote for the OLSRegressionStep template, and then later I wrote get_df() to be cleaner and more general. I do still need to refactor the OLS template to use the newer one, so let's leave this issue open as a reminder!

waddell commented 6 years ago

Ah, I missed the REPM notebook in your branch. Yes that was the issue. Thanks.