jhu-digital-manuscripts / AnIOp

to track the activities of the Mellon funded Annotation Interoperability project
0 stars 0 forks source link

Figure out how to correctly map Georeference annotations to IIIF resources #77

Closed markpatton closed 4 years ago

markpatton commented 4 years ago

Complicated because the annotations target an English translation of Homer while the IIIF resources are the Greek manuscript. Right now the mapping is done heuristically.

jacobwegner commented 4 years ago

@markpatton: I'd be happy to work alongside you on this if you'd like

markpatton commented 4 years ago

@jacobwegner That would be great. I've been looking at the homer multitext cex files because I can go from that to the IIIF canvas (the image ids are close).

The question is how to go from cts:greekLit:tlg0012.tlg001.perseus-eng4:2.585 to a line in the Greek manuscript and then find the image. Is the line urn:cts:greekLit:tlg0012.tlg001.msA:2.585?

I see this in the CEX files.

urn:cts:greekLit:tlg0012.tlg001.msA:2.585#οἵ τε , Λάαν εἶχον , ἠδ' Οίτυλον ἀμφενέμοντο .
urn:cite2:hmt:va_dse.v1:il1200#DSE record for Iliad 2.585#urn:cts:greekLit:tlg0012.tlg001.msA:2.585#urn:cite2:hmt:vaimg.2017a:VA035VN_0537@0.491,0.5688,0.398,0.0248#urn:cite2:hmt:msA.v1:35v

So it looks like I can get the image and from that the IIIF Canvas if I can figure out the greek line CTS URN. And there is also more information available that could be encoded in the annotations if that was useful.

jacobwegner commented 4 years ago

@markpatton sorry for the delay; been working on a couple of things that I think can help support this.

Hope to have an update by mid-day.

jacobwegner commented 4 years ago

@markpatton: I've added a section Text Parts Map to the AniOp ATLAS Documentation.

It shows how to query ATLAS for a particular passage in the barber translation and receive annotated tokens with their CTS subreferences calculated.

This query is very slow, but would get all of the tokens annotated with CTS subreferences for the whole of the catalog of ships passage:

{
  passageTextParts(reference:"urn:cts:greekLit:tlg0012.tlg001.perseus-eng4:2.560") {
    edges {
      node {
        ref
        textContent
        tokens {
          edges {
            node {
              position
              value
              subrefValue
            }
          }
        }
      }
    }
  }
}

That would at least allow you to map Chiara's annotations to the barber URN with subreferences.

This query uses text part references with the folios prepended:

{
  passageTextParts(reference: "urn:cts:greekLit:tlg0012.tlg001.msA-folios:33v.2.484-41v.2.877") {
    edges {
      node {
        ref
      }
    }
  }
}

and could be used to help map the larger sized references from the barber text parts back to recto/verso folios within the Venetus A.

(e.g., 2.560 in the Barber (which maps to ~2.560-2.599 in the Venetus A ) spans folios 35r, 35v and 36r.)

jacobwegner commented 4 years ago

@markpatton: @jtauber has also reached out to Chiara to see if she'd be willing to build a georeference annotation on the greek translation, which would be much easier to map back to folios and likely would allow us to get at least "line-level" fragment selectors, like we have with the translation alignment annotation

markpatton commented 4 years ago

@jacobwegner Ok. Interesting. I'll also pursue my current path and see if I turn out anything that is correct. Will have to have one of you guys review. I can't quite tell.

jacobwegner commented 4 years ago

@markpatton is https://rosetest.library.jhu.edu/rosademo/wa/homer/VA/VA035RN-0036/canvas/annotation/0 the latest annotation? I saw you mentioned in Slack that you were hoping to correct the "the mapping to IIIF canvases."

Was that the CTS URNs with subrefs, or something else?

jacobwegner commented 4 years ago

@markpatton Just to follow up here; I think you've made another pass at attempting to get the Recogito data mapped onto the correct canvases from the HMT data.

I don't know if you had done anything yet to try and update the CTS subrefrences, but I wanted to walk through one of the annotations below:

{
    "@context": [
        "http://www.w3.org/ns/anno.jsonld",
        {
            "prezi": "http://iiif.io/api/presentation/2#",
            "Canvas": "prezi:Canvas",
            "Manifest": "prezi:Manifest"
        }
    ],
    "id": "https://rosetest.library.jhu.edu/rosademo/wa/homer/VA/VA035RN-0036/canvas/annotation/0",
    "type": "Annotation",
    "label": "Georeference data for Venetus A VA035RN-0036 text \"Menestheus\"",
    "creator": "https://recogito.pelagios.org/chiara-p",
    "body": [],
    "target": [
        {
            "type": "SpecificResource",
            "partOf": [
                {
                    "id": "https://rosetest.library.jhu.edu/rosademo/iiif/homer/VA/manifest",
                    "type": "Manifest"
                }
            ],
            "source": {
                "type": "Canvas",
                "id": "https://rosetest.library.jhu.edu/rosademo/iiif/homer/VA/VA035RN-0036/canvas"
            }
        },
        "urn:cts:greekLit:tlg0012.tlg001.perseus-eng4:2.550@Menestheus"
    ]
}

2.550 isn't a valid text part ref in the Barber translation (perseus-eng4) (which was the translation used for the Recogitio data). The text part ref would be 2.520 (https://scaife.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.perseus-eng4:2.520/?highlight=%40Menestheus%5B1%5D

urn:cts:greekLit:tlg0012.tlg001.perseus-eng4:2.550@Menestheus

To further complicate matters, the Murray translation (perseus-eng3) (used in the translation alignments) uses different text part references altogether:

https://scaife.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.perseus-eng3:2.546/?highlight=%40Menestheus%5B1%5D

urn:cts:greekLit:tlg0012.tlg001.perseus-eng3:2.546@Menestheus

I think the only way to map to the CTS subreferences would be to use the lookups I mentioned above. And again, since the text part refs differ between the translations, you couldn't just assume that the URNs for the Barber with map back to the Murray.

@jabrah had mentioned in Annotations that target annotations: named entities, georeference, etc (#90) that he'd prefer having annotations target the translation alignment via TextQuoteSelectors.

I can think of a couple of approaches we might take here:

1) Target the translation alignment using TextQuoteSelector:

Again, due to the differences between the English translations, there may be some text quote selectors that can't be resolved exactly.

For example, https://scaife.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.perseus-eng3:2.546/?highlight=%40Aias%5B1%5D vs https://scaife.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.perseus-eng4:2.520/?highlight=%40Ajax%5B3%5D

2) Target the Barber translation using TextQuoteSelector

The down side of #2 is that we're not mapping back to a particular line of Greek, but that'd at least allow the English translation and annotations to be shown along side of the Greek text on the folio.

jacobwegner commented 4 years ago

(As far as how we'd work internally to make use of the georeference annotations, the CTS subreferences are easier for us to resolve, but we can certainly map the TextQuoteSelector for the Barber translation over to those subreferences)