popolo-project / popolo-spec

International legislative data specifications
http://www.popoloproject.com/
99 stars 18 forks source link

gender over time? #3

Closed jpmckinney closed 11 years ago

jpmckinney commented 11 years ago

It's possible for a person's gender to change over time.

  1. Should we track the changes over time, as we do for a person's name?
  2. How do we implement this?

    Implementation

    New fields strategy

PopIt gives start and end dates for each other name a person has. We can do the same for gender. PopIt also uses other_names to represent alternative names that may not have a start and end date, and allows tagging of the name.

Versioning strategy

We can alternatively version the entire person document, and handle both types of changes using the same mechanism. In Mongoid, versions are embedded documents, in much the same way other_names is an embedded document in PopIt. The only issue with embedding full versions is staying within the maximum MongoDB document size of 16MB (which should be easy).

Lookups

Lookups for previous names and genders would be the same in either strategy: the embedded documents would have start and end dates to locate the appropriate embedded document, and then the name or gender would be read.

Entering historical data

Backfilling data is likely easier to implement using other_names and other_genders fields than using versions. Versions require careful version management, e.g. you would not want to create a new version to correct an error or complete a document, but you would to change a person's name due to an event, e.g. knighting, marriage, etc. Using versions may also conflict with edit histories in certain implementations.

Conclusion

Unless we can identify a large number of fields whose values we want to track over time, the "new fields" strategy seems to be preferable in terms of implementation.

jpmckinney commented 11 years ago

Punting for now.