mitre / data-owner-tools

Tools for the Childhood Obesity Data Initiative (CODI) data owners and partners to use in record linkage
Apache License 2.0
5 stars 8 forks source link

Concat address #40

Closed keithjmiller closed 2 years ago

keithjmiller commented 2 years ago

@keithjmiller: Re-pushed after running black and flake8. Could be done more elegantly by allowing mapping in translation map to be a list, which can be done in the future. This PR addresses the need to handle the 2 specific address columns mentioned in https://www.pivotaltracker.com/story/show/182872990 and adds some data cleanup for the two address fields.


@jsrockhill: data_reader.py will accept lists as values in in config JSON (what Keith suggested above); if encountered, will concat values for columns indicated by list in PII rows with a whitespace to form extracted element. Also moved the generic string cleaning functions and default value checking to data_reader.py from extract.py to allow the data to flow efficiently. For DB extraction, the same logic has been implemented within data_reader.py's case_insensitive_lookup() and the requisite change in DATA_DICTIONARY has been made so as to leverage that change.