zooniverse / panoptes

Zooniverse API to support user defined volunteer research projects
Apache License 2.0
103 stars 41 forks source link

Annotation data discrepancy in python client and csv dump #2660

Closed miclaraia closed 6 years ago

miclaraia commented 6 years ago

Similar to #2649. There is a difference in the classification data presented in the python client and in the classification dump, specifically in the value of a question task in the annotation data. The csv dump gives the text that was presented as an option in the question: [{"task":"T0",...,"value":"All Muons"}]. The python client on the other hand presents integer, which I assume is the index of the option in the question, but honestly I'm not sure. [{'task': 'T0', 'value': 0}]

I personally don't have a specific need or preference for which version is delivered, although consistency across the different methods of accessing classification data in Panoptes would be much appreciated. I'm not arguing that data from the api should be exactly the same as the data in the csv dump, but in this example I can't go from the api to the csv dump without maintaining my own external mapping of the expected answer text and their indices in Panoptes.

{'id': '88474114', 'annotations': [{'task': 'T0', 'value': 0}], 'created_at': '2018-02-03T01:38:05.235Z', 'updated_at': '2018-02-03T01:38:11.037Z', 'metadata': {'session': '9440df0368e7819e53e5ba72b3db94133d9c4a4a4d75014fce12910e4ed968a7', 'viewport': {'width': 1918, 'height': 869}, 'started_at': '2018-02-03T01:37:59.739Z', 'user_agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36', 'utc_offset': '21600', 'finished_at': '2018-02-03T01:38:05.116Z', 'live_project': False, 'user_language': 'en', 'subject_dimensions': [{'clientWidth': 782, 'clientHeight': 782, 'naturalWidth': 2000, 'naturalHeight': 2000}], 'workflow_version': '32.38'}, 'href': '/classifications/88474114', 'links': {'project': '5918', 'workflow': '5734', 'subjects': ['17511849']}}

Same classification data in csv dump: {'classification_id': '88474114', 'user_name': 'not-logged-in-84e52d3d29185275d480', 'user_id': '', 'user_ip': '84e52d3d29185275d480', 'workflow_id': '5734', 'workflow_name': 'cluster classifiation', 'workflow_version': '32.38', 'created_at': '2018-02-03 01:38:05 UTC', 'gold_standard': '', 'expert': '', 'metadata': '{"session":"9440df0368e7819e53e5ba72b3db94133d9c4a4a4d75014fce12910e4ed968a7","viewport":{"width":1918,"height":869},"started_at":"2018-02-03T01:37:59.739Z","user_agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36","utc_offset":"21600","finished_at":"2018-02-03T01:38:05.116Z","live_project":false,"user_language":"en","subject_dimensions":[{"clientWidth":782,"clientHeight":782,"naturalWidth":2000,"naturalHeight":2000}]}', 'annotations': '[{"task":"T0","task_label":"Which class dominates this cluster?","value":"All Muons"}]', ... , 'subject_ids': '17511849'}

marten commented 6 years ago

I don't think we can change either of them at this point. The API results are relied upon by the frontends, which need the raw index of the option (correctly guessed!). And it would also be strange for an API client to get back a different result than what you sent into the API in the first place.

This transformation to readable strings happens in the backend upon exporting, to ease pains that users of those CSV dumps were having. On reflection, we probably should have only extended the JSON, not replaced values. But if we change that now we'll break everyone's data flow. :(

miclaraia commented 6 years ago

Ok, that makes sense. Seeing that the current versions need to see continued support, would it instead be sensible to add a flag to allow a client to request a different format for the annotation data, something that includes both the index and the text of an answer, or to specify which is preferred?

Otherwise, is there a way to request the index-to-text mapping from Panoptes?

camallen commented 6 years ago

@miclaraia - apologies for the tardy reply.

You can request the workflow task state via the version endpoint https://panoptes.docs.apiary.io/#reference/workflows/workflow-version/retrieve-a-single-version (where the id should match the classification version number) and marry the tasks -> annotations that way.

Also there is https://panoptes.docs.apiary.io/#reference/workflowcontents/workflowcontent-version/retrieve-a-single-version

You will be the first person to use that version history and it may be buggy, please do report any issues.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.