opencivicdata / docs.opencivicdata.org

Open Civic Data project documentation
https://open-civic-data.readthedocs.io
44 stars 33 forks source link

modified flags #82

Open fgregg opened 7 years ago

fgregg commented 7 years ago

Sometimes the source text for motions and bills are incomprehensible. It's been Open States practice to rewrite for human readability.

As a data user, I want to know the provenance of the information. If motion or bill title has been rewritten for clarity, I want to know that, and I'd also like to know the original text.

@showerst proposed adding an attribute like this to objects with modified texts (in his example a motion). { "modified" : {"motion": "cp/h lwr"}}

@jamesturk suggested that we use the existing extras field for this, since we may not be ready to standardize on this practice.

I'm curious to hear @jpmckinney's thoughts, as this would effect objects that are part of popolo (which we strive to maintain compatibility with).

showerst commented 7 years ago

Just brainstorming here. We need to know:

  1. That something was modified
  2. What field was modified
  3. Do we care why? ("readability" "deduplication"), etc.
  4. Do we need to datestamp when we changed the code to start modifying? We don't really do anything like that elsewhere so gut feel is no.

Ideally we'd be able to add this to anything, including nested objects, i think the biggest culprits are action names, bill version titles, and vote motions.

jamesturk commented 7 years ago

If we're moving towards standardizing this, I'd like the ability to add a note of some type explaining the change, and maybe also an optional source url for the change. (I assume these would be used when this field is used for manual changes). We had this in billy (and honestly it wasn't used much, but it was handy when manually retiring someone, etc. to link to a news article).

On Wed, Mar 15, 2017 at 4:16 PM, showerst notifications@github.com wrote:

Just brainstorming here. We need to know:

  1. That something was modified
  2. What field was modified
  3. Do we care why?
  4. Do we need to datestamp when we changed the code to start modifying? We don't really do anything like that elsewhere so gut feel is no.

Ideally we'd be able to add this to anything, including nested objects, i think the biggest culprits are action names, bill version titles, and vote motions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opencivicdata/docs.opencivicdata.org/issues/82#issuecomment-286865250, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAfYpSqw63qQeN7l_A_h1-EPgQlLrIdks5rmEcDgaJpZM4Med0b .

jpmckinney commented 7 years ago

It depends on what we're trying to track. If we just want edit histories (like with a wiki article in which user X makes change Y for reason Z) then we'd probably want a log of changes along the lines of the many implementations of edit histories. This sort of log is not in the same scope as "how to model a person". An API might provide a person's properties along with a change log, but these are really two kinds of information.

If we're trying to track real world changes over time (like a person retiring), that falls into the modeling discussion. With Popolo, we try to keep to an "append-only" model, so that we're not editing or deleting values. In the case of a retirement, we'd just append an end date to the person's membership. There's some discussion about adding an end_event field to Membership to track what caused the end date (retirement) and its source (news article).

If we're trying to add editorial or informative information to a record, that also falls into the modeling discussion, but in this case we need to acknowledge that we're not modeling some real world observable fact (like the bizarre abbreviations and jargon used by legislatures); rather, we're modeling an interpretation or translation of the observable fact. That's a semantic difference, and therefore the translation/interpretation needs to be modeled with new properties.

fgregg commented 7 years ago

If we're trying to add editorial or informative information to a record, that also falls into the modeling discussion, but in this case we need to acknowledge that we're not modeling some real world observable fact (like the bizarre abbreviations and jargon used by legislatures); rather, we're modeling an interpretation or translation of the observable fact. That's a semantic difference, and therefore the translation/interpretation needs to be modeled with new properties.

Could you expand on what you mean by "translation/interpretation needs to be modeled with new properties."

jpmckinney commented 7 years ago

For example, if you were to summarize the text of a bill, I would create a new property for that, like dcterms:abstract.