outcomesinsights / generalized_data_model

Outcomes Insights' Data Model for Clinical Research
MIT License
16 stars 3 forks source link

consider location and region tables #111

Open markdanese opened 6 years ago

markdanese commented 6 years ago

See OHDSI forum issue http://forums.ohdsi.org/t/themis-topic-location-table-non-u-s-address-locations/3828/16

A location is a place, represented by a point, where a region is an area, represented by a polygon. Trying to store and reference the two as equivalent unnecessarily creates issues.

First, the biggest obstacle in designing this was that (I believe) a majority of the OHDSI community uses the location table to refer to regions. In other words, their source data does not contain street level address data. I have encountered ETLs where the location table only stores unique zip codes and maps thousands of individuals to the same 'location'. Initially I attempted to reconcile this by including a 'location_type' field in the location table which designated the location as either a place or a region. Building off of this, queries and attribute assignment gets messy and inefficient.

I'm proposing a different model in that any record in the location table refers to a specific location, never a region. If the source data only contains zip codes, then a record in the location table should not refer to the region itself but to 'a location somewhere within the region'. It's seems like a trivial distinction but the effects echo throughout the rest of the design.

If you have a location table that contains only locations, a region table that contains only regions and a mapping table to go between the two, it significantly simplifies things (we then don't have to check the location table for regions, store varying data types, reference polygons vs. coordinates, etc).