kitodo / kitodo-presentation

Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Library Suite.
https://kitodo.org
GNU General Public License v3.0
38 stars 44 forks source link

Kitodo.Presentation dynamic identifier resolving #291

Open claussni opened 6 years ago

claussni commented 6 years ago

Problem

Kitodo.Presentation can only resolve two types of identifiers to obtain METS/MODS documents: internal TYPO3 record PIDs and URLs. URLs have to be properly encoded. This is not obvious to everybody, as browsers automatically check for proper URL encoding to produce valid URLs without bothering users. However, the given URL must still be a valid URL after TYPO3 decoded the query parameters. This essentially requires double URL encoding in most cases, which sometimes fails in the context of RealURL and naive URL handling.

Example

Proposed Solution

Kitodo.Publication should support pluggable identifier resolving. There might be a chain of configured resolvers for a variety of identifiers. For example to resolve internal PIDs, URLs, URNs, DOIs or custom URIs for local systems. One could also think of resolvers that allow dynamic rewriting of URLs using pattern matching. The configuration would include a priority for every registered resolver. Resolvers could be installed using TYPO3 extensions.

sebastian-meyer commented 6 years ago

I don't understand why having different resolvers would solve the "problem" of having to properly encode URL parameters. Even if you use DOIs, URNs or other kinds of identifiers you would still have to encode the tx_dlf[id] URL parameter properly.

Anyways, I can get behind the idea of supporting pluggable identifier resolving. We already have this for metadata and fulltext formats and could quite easily adapt it for identifier resolving.

claussni commented 6 years ago

You are right. On the first look this issue is some kind of a mixed bag. Problem no.1 being double encoding and problem no.2 identifier resolving. I think I meant that a URI resolver could have a feature of making double encoding unnecessary.

claussni commented 5 years ago

Another problem with URLs as identifiers popped up: The given URL has to be resolvable by the TYPO3 system running the extension. In case of Docker containers with port mapping or other complex networking setups this is not always possible.

sebastian-meyer commented 5 years ago

I think there is a misconception about the usage of URLs and identifiers in Kitodo.Presentation. For every document, Kitodo has a location field, which holds an URI for the physical location of the METS file, and a record identifier field, which holds an unique identifier for the document itself. Only the first one has to be resolvable (for obvious reasons), while the latter doesn't have to be resolvable (in fact, it could just be any numeral or string). You can address a document both ways: by providing the (properly encoded) location URI or by providing the record identifier. However, the latter requires to have the document indexed first in order to make the record identifier known to Kitodo. (This is not the case for Qucosa documents, since they are not indexed, but only addressed by their location.)

claussni commented 5 years ago

So there is a location URL parameter? The problem is that this URL needs to be resolvable from within the TYPO3 runtime. This is not necessarily the case in Docker environments.

sebastian-meyer commented 5 years ago

location isn't a separate URL parameter. The parameter tx_dlf[id] can be set either to the location URL or the document's record identifier.

Kitodo.Presentation needs to access the METS file in order to process the information it needs to properly present the document. METS files are addressed with an URI that has to be resolvable (but can be a local file://localhost/path URI). Thus running Kitodo.Presentation within a Docker environment requires either 'mounting' the METS files into the container or making the http URIs resolvable from within the container.