kingsdigitallab / crossreads

Palaeographical environment for CROSSREADS project
1 stars 0 forks source link

Change queue job fails on missing content file #70

Open geoffroy-noel-ddh opened 2 months ago

geoffroy-noel-ddh commented 2 months ago

The change-queue job running on github is failing due to a missing file:

jeff@j3470:~/src/prj/crossreads/tools$ node run-change-queue.mjs 
../annotations/http-sicily-classics-ox-ac-uk-inscription-isic001473-isic001473-jpg.json
../annotations/https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic001447-isic001447-jpg.json
file:///home/jeff/src/prj/crossreads/tools/run-change-queue.mjs:34
      for (let annotation of content) {

jeff@j3470:~/src/prj/crossreads/tools$ grep -rin 'api-dts' ../annotations/change-queue.json 
20:          "file": "https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic001447-isic001447-jpg.json"
33:          "file": "https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic001408-isic001408-jpg.json"
37:          "file": "https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic001408-isic001408-jpg.json"
41:          "file": "https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic030002-isic001408-jpg.json"
49:          "file": "https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic030002-isic001408-jpg.json"
61:          "file": "https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic001445-isic001445-jpg.json"
65:          "file": "https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic001464-isic001464-jpg.json"
69:          "file": "https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic001445-isic001445-jpg.json"
73:          "file": "https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic001471-isic001471-jpg.json"
77:          "file": "https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic001471-isic001471-jpg.json"
81:          "file": "https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic001471-isic001471-jpg.json"

11 out of 36 changes in the queue have a different format for the filename. With -api-dts in it. Why?

    {
      "annotations": [
        {
          "id": "https://crossreads.web.ox.ac.uk/annotations/ac784b4f-1569-4a9b-8f69-30935a3a1964",
          "file": "http-sicily-classics-ox-ac-uk-inscription-isic001473-isic001473-jpg.json"
        }
      ],
      "tags": [
        "one-mid-bar"
      ],
      "creator": "https://api.github.com/users/simonastoyanova",
      "created": "2024-09-05T07:24:28.844Z"
    },
    {
      "annotations": [
        {
          "id": "https://crossreads.web.ox.ac.uk/annotations/b2aab90c-c28c-4dc2-96e0-b066d6d97028",
          "file": "https-crossreads-web-ox-ac-uk-api-dts-documents-id-isic001447-isic001447-jpg.json"
        }
      ],
      "tags": [
        "one-mid-bar"
      ],
      "creator": "https://api.github.com/users/simonastoyanova",
      "created": "2024-09-05T07:25:00.040Z"
    },

That annotation exists in this file instead:

jeff@j3470:~/src/prj/crossreads/tools$ l ../annotations/*1447*
-rw-rw-r-- 1 jeff jeff 20K Apr 10 00:21 ../annotations/http-sicily-classics-ox-ac-uk-inscription-isic001447-isic001447-jpg.json

Annotation in the correct file:

  {
    "@context": "http://www.w3.org/ns/anno.jsonld",
    "type": "Annotation",
    "body": [
      {
        "type": "TextualBody",
        "purpose": "describing",
        "format": "application/json",
        "value": {
          "script": "greek-1",
          "components": {
            "apex": {
              "features": [
                "rounded"
              ]
            },
            "bottom-bar": {
              "features": [
                "curved",
                "sans-serif",
                "diagonal",
                "touching",
                "below-baseline"
              ]
            },
            "middle-bar": {
              "features": [
                "diagonal",
                "sans-serif",
                "touching",
                "curved"
              ]
            },
            "top-bar": {
              "features": [
                "sans-serif",
                "curved",
                "diagonal",
                "touching"
              ]
            }
          },
          "tags": null,
          "character": "Σ"
        }
      }
    ],
    "target": [
      {
        "source": "https://apheleia.classics.ox.ac.uk/iipsrv/iipsrv.fcgi?IIIF=/inscription_images/ISic001447/ISic001447_tiled.tif",
        "selector": {
          "type": "FragmentSelector",
          "conformsTo": "http://www.w3.org/TR/media-frags/",
          "value": "xywh=pixel:212.23928833007812,111.04105377197266,292.1890869140625,164.00289154052734"
        }
      },
      {
        "source": "https://crossreads.web.ox.ac.uk/api/dts/documents?id=ISic001447",
        "selector": {
          "type": "XPathSelector",
          "value": "//*[@xml:id='BsΤAe']",
          "refinedBy": {
            "type": "TextPositionSelector",
            "start": 0,
            "end": 1
          }
        }
      }
    ],
    "id": "https://crossreads.web.ox.ac.uk/annotations/b2aab90c-c28c-4dc2-96e0-b066d6d97028",
    "generator": "https://github.com/kingsdigitallab/crossreads#2023-09-01-00",
    "creator": "https://api.github.com/users/simonastoyanova",
    "created": "2023-11-22T17:11:44.697Z",
    "modifiedBy": "https://api.github.com/users/simonastoyanova",
    "modified": "2023-11-22T17:13:03.988Z"
  },

It looks like the reference to the file in the change queue was incorrectly constructed by the search page from the textual source, rather than the image source. That's most likely due to the wrong assumption that the second target is always the image?

geoffroy-noel-ddh commented 2 months ago

The search page reconstructs the filename using getAnnotationFileNameFromItem(item). Made from doc & img path in the annotation.

That method is wrong. It should be the same as the one used by the annotator. Made from DTS member id and the image file name in the TEI.

Answer 2

At some point we switched the reference to the text from a link to the sicily domain to the DTS request path for that document. The old reference pattern and the DTS object ID share the same substring. Which makes the wrong method work with older annotations.

$ grep -rin 'source":' annotations/ | grep -v 'IIIF'

annotations/http-sicily-classics-ox-ac-uk-inscription-isic000176-isic000176-jpg.json:51:        "source": "http://sicily.classics.ox.ac.uk/inscription/ISic000176.xml",
annotations/http-sicily-classics-ox-ac-uk-inscription-isic000186-isic000186-jpg.json:61:        "source": "http://sicily.classics.ox.ac.uk/inscription/ISic000186.xml",
annotations/http-sicily-classics-ox-ac-uk-inscription-isic020300-isic020300-jpg.json:36:        "source": "https://crossreads.web.ox.ac.uk/api/dts/documents?id=ISic020300",
annotations/http-sicily-classics-ox-ac-uk-inscription-isic020300-isic020300-jpg.json:108:        "source": "https://crossreads.web.ox.ac.uk/api/dts/documents?id=ISic020300",

Answer 3

See fix below, I've made the change queue script tolerant to invalid references. So no need to update the change queue itself.

Answer 1

Resolution for that is still pending.

simonastoyanova commented 2 months ago

@geoffroy-noel-ddh Interesting, the two files 1447 and 1473 were annotated a while back. Are they the only files that throw up this issue? The majority of the archaic texts were done at the same time. Let me know if you need me to test anything.

geoffroy-noel-ddh commented 2 months ago

Hi Simona, the change queue (i.e. your bulk tag edits from the search page) has been successfully processed now. And the error shouldn't occur any more as I have made the automated script tolerant to invalid references. I'll soon fix the search so it doesn't produce those invalid references any more. (Although invalid, they uniquely refer to the right inscription & annotation, but not the exact correct format for the file name; so no risk of tags going to the wrong place).

If you want to help, just check that the change queue message says 'no change(s) pending' (see below) at least once a day and the tags you've applied are indeed reflected in the annotator (just check a few). If you notice anything wrong, please let me know. Thank you.

image