CKrawczyk commented 3 years ago

Package

lib-classifier

Describe the bug

Sometimes the transcription task creates a classification that has a line, but no associated text task (e.g. with details: []). Looking at the classification data of the lines with no text they seem to be duplicates of lines that do have text (see example below).

To Reproduce

I am unsure how to reproduce this bug, but some example classifications are:

id: 310540545
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36"
"subject_id": 55414222
"user_id": None
"workflow_id": 13898
---
"id": 310580237
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
"subject_id": 55414317
"user_id": None
"workflow_id": 13898
---
"id": 310607565
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15"
"subject_id": 55414433
"user_id": 2158146
"workflow_id": 13898
---
"id": 310624327
"user_agent": "Mozilla/5.0 (Linux; Android 11; SM-G998B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.181 Mobile Safari/537.36"
"subject_id": 55414216
"user_id": 1654808
"workflow_id": 13898
---
"id": 310641679
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.152 Safari/537.36"
"subject_id": 55414220
"user_id": 1988515
"workflow_id": 13898
---
"id": 311198050
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0"
"subject_id": 55414309
"user_id": 2237331
"workflow_id": 13898
---
"id": 311199130
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0"
"subject_id": 55414445
"user_id": 2237331
"workflow_id": 13898

These are all from project 11300.

Expected behavior

Every line should have text associated with it. Although exact duplicates will not have an issue on the aggregation side, I expect they should not be happening at all given the accuracy of the drawing interface.

Example classification with this issue

{"annotations": [{
  "task": "T1", 
  "taskType": 
  "transcription", 
  "value": [
    {
      "details": "[{'task': 'T1.0.0'}]", 
      "frame": "0", 
      "toolIndex": "0", 
      "toolType": "'transcriptionLine'", 
      "x1": "93.51045989990234", 
      "x2": "1449.473876953125", 
      "y1": "756.28759765625", 
      "y2": "725.5006713867188"
    }, {
      "details": "[]",
      "frame": "0", 
      "toolIndex": "0",
      "toolType": "'transcriptionLine'",
      "x1": "93.51045989990234", 
      "x2": "1449.473876953125", 
      "y1": "756.28759765625", 
      "y2": "725.5006713867188"
    }, {
      "details": "[]", 
      "frame": "0", 
      "toolIndex": "0", 
      "toolType": "'transcriptionLine'", 
      "x1": "93.51045989990234", 
      "x2": "1449.473876953125", 
      "y1": "756.28759765625", 
      "y2": "725.5006713867188"}]
    }, {
      "markIndex": 0, 
      "task": "T1.0.0", 
      "taskType": "text", 
      "value": "is gratefully appreciated by our society"
    }], 
    "created_at": "2021-02-19T22:21:18.180Z", 
    "id": 310607565, 
    "metadata": {
      "feedback": {}, 
      "finished_at": "2021-02-19T22:21:17.515Z", 
      "live_project": True, 
      "session": "a08726ab3fd3babae9c233f2cbb13bf15f20694c1b098d8ace47f7b8f83a0f1b", 
      "source": "API", 
      "started_at": "2021-02-19T22:16:04.497Z", 
      "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15", 
      "user_language": 
      "en", 
      "utc_offset": "-3600", 
      "viewport": {
        "height": 430, 
        "width": 834}
      }, 
    "project_id": 11300, 
    ...
}

Device information

From the user agents above this is coming from many different devices and browsers.

CKrawczyk commented 3 years ago

Here is the associated sentry issue from aggregation: https://sentry.io/organizations/zooniverse-27/issues/2227296723/?project=1760084&query=is%3Aunresolved

As Ceaser retires each extractor multiple times on familiar each unique case shows up 5 times.

eatyourgreens commented 3 years ago

1719#issuecomment-663275184

srallen commented 3 years ago

The Sentry issue is still getting reports of missing text, so while we are now preventing multiple marks from being created from a previous mark from Caesar now, and we no longer see duplicate marks in the Sentry report, it did not resolve this. I believe the duplication was technically a separate issue.

I believe there may be a bug with the confirmation dialog which determines if the sub-task should remain open or delete the mark depending on the choice of the volunteer.

zooniverse / front-end-monorepo

Transcription task somtimes sends classifications with no text task #2100