zooniverse / caesar

Backend automation and orchestration
https://zooniverse.github.io/caesar
Apache License 2.0
13 stars 13 forks source link

tons of basically empty/useless extracts in TESS #972

Open amy-langley opened 5 years ago

amy-langley commented 5 years ago

classifications from TESS experimental subjects have no training data and so generate no feedback. unfortunately, they still return as part of the classification metadata a feedback object with an empty array for T1:

{
      "id": "180067442",
      "annotations": [
        {
          "task": "T1",
          "value": []
        }
      ],
      "created_at": "2019-08-22T19:13:57.806Z",
      "updated_at": "2019-08-22T19:13:58.664Z",
      "metadata": {
        "source": "api",
        "session": "7b58393a928bb00f163ee50e7ff8b12275aeed97fd6014f4262d1eabced51a57",
        "feedback": {
          "T1": []
        },
        "viewport": {
          "width": 1920,
          "height": 938
        },
        "started_at": "2019-08-22T19:12:00.312Z",
        "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0",
        "utc_offset": "10800",
        "finished_at": "2019-08-22T19:13:55.287Z",
        "live_project": true,
        "user_language": "en",
        "user_group_ids": [],
        "subject_flagged": false,
        "classifier_version": "2.0",
        "subject_dimensions": [
          {
            "clientWidth": 1280,
            "clientHeight": 402,
            "naturalWidth": 0,
            "naturalHeight": 0
          }
        ],
        "subject_selection_state": {
          "retired": false,
          "selected_at": "2019-08-22T19:13:27.936Z",
          "already_seen": false,
          "selection_state": "normal",
          "finished_workflow": false,
          "user_has_finished_workflow": false
        },
        "workflow_version": "15.5"
      },
      "href": "/classifications/180067442",
      "links": {
        "project": "7929",
        "user": "XXXX",
        "user_group": null,
        "workflow": "11235",
        "subjects": [
          "36019993"
        ]
      }
    }

The if_missing: 'reject' setting can prevent extraneous extracts from being created if the JSONPath expression fails to match, but this:

        "feedback": {
          "T1": []
        },

will always generate the useless match: [ [] ] which breaks the feature. We need to decide whether Caesar should consider this a non-match (possibly bad, empty array could be valid) or whether the front-end should stop sending classifications in this format.

amy-langley commented 5 years ago

tagging @zwolf @srallen for future discussion

zwolf commented 5 years ago

See: https://github.com/zooniverse/front-end-monorepo/issues/874

amy-langley commented 5 years ago

Thanks for linking the relevant issue--looks like @mcbouslog spotted it forever ago but we never realized it was causing issues for us, so they rightly prioritized other things.

amy-langley commented 5 years ago

@srallen has a PR resolving this https://github.com/zooniverse/front-end-monorepo/pull/1095 although the matter of how to clean up the existing junk rows will be interesting. first priority is to stop making more :)