kingsdigitallab / crossreads

Palaeographical environment for CROSSREADS project
1 stars 0 forks source link

Connect annotator to DTS #19

Open geoffroy-noel-ddh opened 1 year ago

geoffroy-noel-ddh commented 1 year ago

Once a DTS server is up and running on the Oxford infrastructure the Annotator can pulled the collection and perhaps more metadata/content from it.

Annotator is currently using a static copy of the DTS collection generated in Nov 2022 from the DTS PoC and the github repository.

geoffroy-noel-ddh commented 8 months ago

The DTS collection is now saved into a json file on github automatically by an action on the ISicly repository.

There web redirects have been set up for the collection endpoint and the documents:

The sub-collection of objects now visible in the annotator is built from that DTS collection.

geoffroy-noel-ddh commented 8 months ago

@JonPrag Jonathan, I think there are a few minor issues to address before completing the integration of the DTS redirects with the annotator:

  1. the collection redirect points to the wrong address (see first bullet point in previous message), it should instead lead to https://raw.githubusercontent.com/ISicily/ISicily/master/dts/collection.json .
  2. the redirects don't yet support HTTPS. That's not a blocking issue for the collection because the request works from the command line. But it is more urgent for the document redirect as the annotator and all dependencies are on HTTPS, some browsers may block requests from a secure page to non-secure resources.
  3. The collection id at the top of the collection file looks malformed: "@id":"http://http://sicily.classics.ox.ac.uk"

The annotator fetches the documents using the "download" value in the collection member. Which is a raw git address. That's functional. I could use the new address of the document instead (e.g. http://sicily.classics.ox.ac.uk/inscription/ISic000001.xml), but that would be built using external logic to the DTS collection. The collection itself doesn't provide any clue about that address. The DTS way to obtain the document is either with the download attribute (although this bypasses the document endpoint) by passing the member ID to the document endpoint: e.g. https://sicily.classics/ox.ac.uk/dts/document?id=https://sicily.classics.ox.ac.uk/inscription/ISic000001)

Anyway, I think there are two options available for the references to the documents: a. we leave it as it is and the annotator keep using the download link in the collection, in which case the document redirect is never used b. the redirect is changed to map proper DTS document query (see URL just above) with query strings to github

a. is simple and already work. But we need to understand that the project claims about DTS compliance should remain quite low. There's no way someone would be able to use a DTS client only to interact with the texts. They'd need to inject some project-specific logic; which, as I explained before, is defeating the point of DTS.

b. would improve that a bit, but may not be worth it if we are happy with something functional as it is.

JonPrag commented 8 months ago

@geoffroy-noel-ddh thanks for this.

  1. Re point 1: Everyone up to now seems to have been specifying the collection redirect to be for https://github.com/ISicily/ISicily/dts/collection.json so I'm not sure what was missed. I can ask Richard in ITSS to alter this to point to https://raw.githubusercontent.com/ISicily/ISicily/master/dts/collection.json

  2. Re point 2: I'm not sure https has been explicitly specified anywhere. Deeply conscious that it's wanted, but the old site was built on http and we can't upgrade it sufficiently on Ubuntu to make it https compliant (the switch from ubuntu 14 to 16 meets at least one critical roadblock according to ITSS), so that waits on the full rebuild of the site (Crossreads B).

  3. re point 3: yes, I see. I'll ask James.

  4. Your broader point - yes, I think I see what you mean, the retention of the download github URL seems to miss the point of having a stable URI with redirect that is built from the member ID. Presumably that should have been changed in the collection file once the redirects were established (although then is there then actually any difference between the download attribute and the member ID attribute?). But could you clarify for me, since when you say (b) the redirect should be changed to map proper DTS document query, does this mean that there two changes required - i. changing the download attribute in the collection file (to http://sicily.classics.ox.ac.uk/inscription/ISic000002 or equivalent) and ii. changing the redirect to respond to the URL of a DTS document query as you specify above, such that https://sicily.classics/ox.ac.uk/dts/document?id=https://sicily.classics.ox.ac.uk/inscription/ISic000001 will point to http://sicily.classics.ox.ac.uk/inscription/ISic000001.xml

Will await your answer to (4) in particular before I pursue with James (who I hope will be able to make these minor tweaks, but is otherwise now unavailable).

geoffroy-noel-ddh commented 8 months ago

Hi @JonPrag ,

Apologies for the delay.

Re. your question under point 4.

I hope that clarifies things a bit?


Now... one thing I discovered is that the DTS spec has been recently updated. With breaking changes! And the doc of the previous version (which the Annotator and your collection follow) wasn't easily accessible online last time I checked. One of the breaking changes is the parameter name for obtaining a doc by its ID. So instead of ?id=DOCUMENTID it is now ?resource=DOCUMENTID.

https://distributed-text-services.github.io/specifications/versions/1-alpha/#document-endpoint

If we want to match the latest version of DTS, we'd both have to upgrade the code at each end. And there might be other backward incompatible changes. I haven't assessed the extend of the changes needed to updated compliance. Alternatively we could imply agree to match the previous version (1-draft2 rather than the newly released 1-alpha). If there's enough time, desire and resource for it we could upgrade to the latest version by the end of the project. But for now I'd recommend just stay aware that the official spec no longer support our implementation.