zestai / zrp

Zest Race Predictor
Apache License 2.0
28 stars 3 forks source link

Cannot opt out of geocoding in ZRP #21

Open jcuriel-unc opened 2 years ago

jcuriel-unc commented 2 years ago

Is there an existing issue for this?

What happened?

I attempted to run ZRP w/only surname and zip code by setting the ZRP() command to geocode = False w/in the ZRP Example script. However, this appears to not be possible, with the reported error still attempting to run the geocoding step, and seemingly breaking down when it tries to match on block group. Note that prior to setting the values as empty within the zrp_sample, I was able to get the ZRP commands to run. The following are the set of errors reported:

AttributeError Traceback (most recent call last) File :3, in

File ~\anaconda3\lib\site-packages\zrp\zrp.py:116, in ZRP.transform(self, input_data) 114 z_prepare = ZRP_Prepare(file_path=self.file_path, **self.params_dict) 115 z_prepare.fit(data) --> 116 prepared_data = z_prepare.transform(data) 118 curpath = dirname(file) 119 if self.pipe_path is None: zrp_geocoding_error

File ~\anaconda3\lib\site-packages\zrp\prepare\prepare.py:170, in ZRP_Prepare.transform(self, input_data) 168 validate = ValidateGeocoded() 169 validate.fit() --> 170 acs_validator = validate.transform(geo_coded) 171 save_json(acs_validator, self.out_path, "input_acs_validator.json") 172 print(" [Completed] Validating ACS input data")

File ~\anaconda3\lib\site-packages\zrp\validate.py:441, in ValidateGeocoded.transform(self, data) 439 except (KeyError, ValueError) as e: 440 pass --> 441 validator["is_geocoded"] = self.is_geocoded(data) 442 return(validator)

File ~\anaconda3\lib\site-packages\zrp\validate.py:134, in BaseValidate.is_geocoded(self, data) 130 geocoded_cts["count"] = {} 132 geocoded_cts["count"]["GEOID"] = data[(data["GEOID"].str.len()>4) 133 & (data.index.duplicated(keep = "first"))].shape[0] --> 134 geocoded_cts["count"]["Block Group"] = data[(data["GEOID_BG"].str.len()>11)
135 & (data["GEOID_BG"] == data["GEOID"]) 136 & (data["GEOID_BG"].notna())].shape[0] 137 geocoded_cts["count"]["Census Tract"] = data[(data["GEOID_CT"].str.len()>10) 138 & (data["GEOID_CT"] == data["GEOID"]) 139 & (data["GEOID_CT"].notna())].shape[0] 140 geocoded_cts["count"]["Zip Code"] = data[(data["GEOID_ZIP"].str.len() == 5)
141 & (data["GEOID_ZIP"] == data["GEOID"]) 142 & (data["GEOID_ZIP"].notna())].shape[0]

File ~\anaconda3\lib\site-packages\pandas\core\generic.py:5461, in NDFrame.getattr(self, name) 5454 # Note: obj.x will always call obj.getattribute('x') prior to 5455 # calling obj.getattr('x'). 5456 if ( 5457 name in self._internal_names_set 5458 or name in self._metadata 5459 or name in self._accessors 5460 ): -> 5461 return object.getattribute(self, name) 5462 else: 5463 if self._info_axis._can_hold_identifiers_and_holds_name(name):

File ~\anaconda3\lib\site-packages\pandas\core\accessor.py:180, in CachedAccessor.get(self, obj, cls) 177 if obj is None: 178 # we're accessing the attribute of the class, i.e., Dataset.geo 179 return self._accessor --> 180 accessor_obj = self._accessor(obj) 181 # Replace the property with the accessor object. Inspired by: 182 # https://www.pydanny.com/cached-property.html 183 # We need to use object.setattr because we overwrite setattr on 184 # NDFrame 185 object.setattr(obj, self._name, accessor_obj)

File ~\anaconda3\lib\site-packages\pandas\core\strings\accessor.py:154, in StringMethods.init(self, data) 151 def init(self, data): 152 from pandas.core.arrays.string_ import StringDtype --> 154 self._inferred_dtype = self._validate(data) 155 self._is_categorical = is_categorical_dtype(data.dtype) 156 self._is_string = isinstance(data.dtype, StringDtype)

File ~\anaconda3\lib\site-packages\pandas\core\strings\accessor.py:217, in StringMethods._validate(data) 214 inferred_dtype = None 216 if inferred_dtype not in allowed_types: --> 217 raise AttributeError("Can only use .str accessor with string values!") 218 return inferred_dtype

AttributeError: Can only use .str accessor with string values!

Steps To Reproduce

zrp_geocoding_error (skipping over the previous steps within the example).

Within Jupyter notebook, set all but the zip_code and last_name to "" Set geocode=False in the ZRP command Run the example ZRP command, as seen in the image

What browsers are you seeing the problem on?

No response

Environment

- OS: Windows 10 Pro 
- Node: 
- yarn:

Anything else?

No response

Code of Conduct

kasey-zest commented 2 years ago

Thanks for reporting this issue @jcuriel-unc . We are currently investigating.

kasey-zest commented 2 years ago

This feature was intended to support a user skipping geocoding if they supply block group or census tract. We will work on updating the geocoding functionality and documentation.

Please in the meantime refer to this notebook https://github.com/zestai/zrp/blob/main/examples/modeling/generate_proxies_zip_only.ipynb to generate race & ethnicity proxies based on zip code alone.