juliema / label_reconciliations

Code for reconciling multiple transcriptions for a label
MIT License
26 stars 11 forks source link

reconciling label and/or value #30

Closed denslowm closed 6 years ago

denslowm commented 7 years ago

This is a follow up from a few different discussions that several of us had recently.

Dropdown tasks have labels and values. Labels are what a user sees in the actual dropdown menu and the value is something that is also recorded. For example, the label could be California and the value could be CA. If a new dropdown is created without values then the value is given a random number such as 8cb63930b159f.

At present I believe that the reconcilation script only uses the label. Do we want to be able to access the values as well? I could see a lot of utility in this for dates and beyond.

The random number would seem to add a complication.

cc @robgur @mcbouslog

------ An example of the output ------

[{"task":"T1","value":[{"select_label":"Country","option":true,"value":"840","label":"United States of America"},{"select_label":"State/Province","option":true,"value":"CA","label":"California"},{"select_label":"County","option":true,"value":"8cb63930b159f","label":"Los Angeles"}]},{"task":"T3","value":"W.Fork San Gabriel Riv.","task_label":"Location"},{"task":"T19","task_label":null,"value":[{"task":"T16","value":"","task_label":"Latitude"},{"task":"T17","value":"","task_label":"Longitude"},{"task":"T18","value":"","task_label":"Altitude/Elevation"}]},{"task":"T11","task_label":null,"value":[{"task":"T8","value":[{"select_label":"Month","option":true,"value":"4cfa5e70706ac8","label":"7 - July - VII"}]},{"task":"T9","value":[{"select_label":"Day","option":true,"value":18,"label":"18"}]},{"task":"T10","value":[{"select_label":"Year","option":true,"value":1973,"label":"1973"}]},{"task":"T21","value":[{"select_label":"Month"}]},{"task":"T22","value":[{"select_label":"Day"}]},{"task":"T23","value":[{"select_label":"Year"}]}]},{"task":"T27","task_label":null,"value":[{"task":"T24","task_label":"Rearing Information","value":"No"},{"task":"T25","task_label":"Host plant information","value":"No"},{"task":"T26","value":[{"select_label":"Type","option":true,"value":"5ec52887576cf","label":"Not shown"}]}]},{"task":"T6","value":"","task_label":"Collected By"},{"task":"T30","task_label":null,"value":[{"task":"T28","value":"Catocala","task_label":"Genus"},{"task":"T29","value":"myristic","task_label":"Species"},{"task":"T2","value":"","task_label":"Subspecies"}]},{"task":"T13","value":[{"select_label":"Sex","option":true,"value":"78eeb18268e2e","label":"Male"}]}]

juliema commented 7 years ago

yes I think the idea here is that providers might want dates formatted a specific way, but that might not be the best way to present to the transcribers. Would it be possible for providers to enter the value formatted the way they want, when uploading an expedition and then would it be possible then for us to take the value (rather than the label) of the date in the reconciliation processes?

On Thu, May 11, 2017 at 5:28 PM Michael Denslow notifications@github.com wrote:

This is a follow up from a few different discussions that several of us had recently.

Dropdown tasks have labels and values. Labels are what a user sees in the actual dropdown menu and the value is something that is also recorded. For example, the label could be California and the value could be CA. If a new dropdown is created without values then the value is given a random number such as 8cb63930b159f.

At present I believe that the reconcilation script only uses the label. Do we want to be able to access the values as well? I could see a lot of utility in this for dates and beyond.

The random number would seem to add a complication.

cc @robgur https://github.com/robgur @mcbouslog https://github.com/mcbouslog

------ An example of the output ------

[{"task":"T1","value":[{"select_label":"Country","option":true,"value":"840","label":"United States of America"},{"select_label":"State/Province","option":true,"value":"CA","label":"California"},{"select_label":"County","option":true,"value":"8cb63930b159f","label":"Los Angeles"}]},{"task":"T3","value":"W.Fork San Gabriel Riv.","task_label":"Location"},{"task":"T19","task_label":null,"value":[{"task":"T16","value":"","task_label":"Latitude"},{"task":"T17","value":"","task_label":"Longitude"},{"task":"T18","value":"","task_label":"Altitude/Elevation"}]},{"task":"T11","task_label":null,"value":[{"task":"T8","value":[{"select_label":"Month","option":true,"value":"4cfa5e70706ac8","label":"7

  • July - VII"}]},{"task":"T9","value":[{"select_label":"Day","option":true,"value":18,"label":"18"}]},{"task":"T10","value":[{"select_label":"Year","option":true,"value":1973,"label":"1973"}]},{"task":"T21","value":[{"select_label":"Month"}]},{"task":"T22","value":[{"select_label":"Day"}]},{"task":"T23","value":[{"select_label":"Year"}]}]},{"task":"T27","task_label":null,"value":[{"task":"T24","task_label":"Rearing Information","value":"No"},{"task":"T25","task_label":"Host plant information","value":"No"},{"task":"T26","value":[{"select_label":"Type","option":true,"value":"5ec52887576cf","label":"Not shown"}]}]},{"task":"T6","value":"","task_label":"Collected By"},{"task":"T30","task_label":null,"value":[{"task":"T28","value":"Catocala","task_label":"Genus"},{"task":"T29","value":"myristic","task_label":"Species"},{"task":"T2","value":"","task_label":"Subspecies"}]},{"task":"T13","value":[{"select_label":"Sex","option":true,"value":"78eeb18268e2e","label":"Male"}]}]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/juliema/label_reconciliations/issues/30, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQM7tdp7vtFlqCApo4D3_4vc74eIEIjks5r4317gaJpZM4NYk-Z .

mcbouslog commented 7 years ago

I think we specifically designed the dropdown task editor to not allow the determination of an option's value for a few reasons, mostly because we wanted to keep the values as concise and machine-readable as possible, as that value is what is stored in the database for each classification, whereas the human-readable label is stored with the workflow_contents (once for each workflow version), but not with each classification (potentially hundreds of thousands of times). The classification export csv puts the value and label together to make each classification more human-readable, but it's not how the classification is actually stored in the database. I'm not sure I'm explaining this well, or even remembering well, but that's what immediately comes to mind, let me know if this needs to be explained better.

rafelafrance commented 7 years ago

Now that we have reconciliation data types as plug-ins, we could add a new "monthnum" column type that would extract the month number during the reconciliation and put that value into the reconciled CSV and Summary (not the unreconciled CSV). The State and country data are more tricky but also doable with a table look-up.

The down side of this is that it would put a burden on the person running the reconciliation scripts to identify the columns we need to do this to and add the appropriate arguments to the script.

reconcile.py --column-types="My Month:monthnum,My State:state" ...