Closed rafelafrance closed 6 years ago
I can't see any tasks above T11 for that workflow. Annotations for tasks like T28
have no reference on the workflow tasks so we can't cross reference them,when this happens we return the raw annotation in it's place.
Not sure where those annotation task ids are coming from, do you have other workflows with those task id's? It may be a bad client submitting incorrect data, how common is this error in your dump?
So far it's a single bad record in 10K - 20K records processed. We have another 50K+ records to process.
There are other workflows with these task numbers and in this order. However, they all have two or more task_labels whereas this record has only one. So the forms don't match exactly.
My question to you is: Does this record look like this in the DB? I want to narrow down where in the pipeline the error occurred. Pre or Post data entry... or even possibly post-delivery
FYI:
~/notesFromNature/label_reconciliations/temp$ grep -P '""T3"".+""T28"".+""T1"".+""T2"".+""T29"".+""T5"".+""T6"".+""T30"".+""T8"".+""T9"".+""T31"".+""T13"".+""T14"".+""T16"".+""T32"".+""T17"".+""T18""' notes-from-nature-classifications.csv |wc
1949 146376 5256187
~/notesFromNature/label_reconciliations/temp$ grep -P '""T3"".+""T28"".+""T1"".+""T2"".+""T29"".+""T5"".+""T6"".+""T30"".+""T8"".+""T9"".+""T31"".+""T13"".+""T14"".+""T16"".+""T32"".+""T17"".+""T18""' notes-from-nature-classifications.csv | grep -P '""task_label"".+""task_label""' | wc
1948 146335 5253622
@rafelafrance as the project owner / collaborator, you should be able to get the classification resource representation via the api, http://docs.panoptes.apiary.io/#reference/classification/classification/retrieve-a-single-classification
Also we have a python client too https://github.com/zooniverse/panoptes-python-client/
classification = Classification.find("17187451")
So the data is in Panoptes' in the wrong format and your data dumps are faithful. My data reconciliation programs will definitely pickup cases where the data is really wrong (like this one) but if there are more subtle data issues we in a very bad place.
So it sounds like we can delete this one record and proceed with the reconcilation this time. If this comes up again in the future then maybe it will require a deeper dive into the data. Rafe, if that seems reasonable to you then I am fine with it as well.
I've already put the code in reconciler.py to skip (with an error message) records like this a while back. This issue is not about fixing reconciler.py, this is about Panoptes.
@rafelafrance the api stores the annotation data as is upon receipt from the client. This could be an issue with the front end submitting malformed annotations?
Possibly. However, given the shape of the bad data it's unlikely directly related to that. It could be in the panoptes-client code too. Maybe it has a race condition with multiple tabs... we don't know and I'm still trying to gather information.
All we know for sure is that it is not in the data distribution end.
front-end --> panoptes-client --> panoptes --> data dump etc.
possibly possibly possibly no from this point forward
Just to reiterate that in the panoptes api, classifications are not touched after the metadata updates here. It's write once and read from then on and we never touch the task annotations, they are stored how they are received from the client.
If it's impossible for the bug to be in Panoptes then where in the system did it go wrong? Where does this issue get posted? PanoptesFrontend?
Not saying it's impossible just highly unlikely to be an api issue. As i stated before:
This could be an issue with the front end submitting malformed annotations?
Try reporting on the front end repo and seeing if you get some traction there. Do you have any fequence stats on this type of malformed classification event? And is there a coherent time window for them? If it was a bad deploy / bug then you should see the offending classifications only in the timewindow between bad deploy / fix going out.
Could this classification have been generated by a staged version of the PFE dropdown tool, testing against the production database? option
and value
are keys generated by the react-select
component.
Anyway, as @camallen says this is more than likely a bug in zooniverse/Panoptes-Front-End since the annotations are generated there, and not modified by Panoptes.
Certainly could have (i'm not sure we track metadata about the url src for an old event like this), however i checked the user and it doesn't seem like someone on the dev team...but they may have got a link from github / shared, etc.
@eatyourgreens any thoughts on storing some pfe source origin indicator in the classification metadata. We could use the referer header api side to mark it as well.
@camallen PFE records the user agent string here https://github.com/zooniverse/Panoptes-Front-End/blob/master/app/pages/project/classify.cjsx#L155 It doesn't know anything about the environment (production vs. staging) as that's all handled by the API client and hidden from the classification. The only other variable I can think of tracking would be the version of PFE, maybe via a git commit hash or something? (eg. https://github.com/zooniverse/Panoptes-Front-End/blob/master/views/index.ejs#L22)
This could well be a bug in the react-select
module or the dropdown task. I did a bunch of work, with Mark, updating those before Christmas, which might also have resolved this. See zooniverse/Panoptes-Front-End#3233
thanks @eatyourgreens i'll have a think about this api side instead.
closing this for now, for the same reasons on this post https://github.com/zooniverse/Panoptes/issues/2491#issuecomment-342492355
Project: "Notes from Nature" Workflow: 2563, "Herbarium_Arkansas Dendrology: Part 2: Magnolias, pawpaws, sassafras, and Dutchman's pipe -- 19 September 2016" Classification ID: 17187451
In the data dump this annotation is completely off. It has the form of
When it really should look similar to