libris / whelk-core

0 stars 3 forks source link

Specify provenance data for Document #2

Open niklasl opened 7 years ago

niklasl commented 7 years ago

We need to nail down the intended usage (i.e. "meaning") of collection, changedIn and changedBy in Document & LDDB. Do they represent original datasource and/or an LDP container, or something entirely internal? (Also, can the changed* and dependent logic be removed, if the Voyager two-way sync mechanism is to be removed?)

If they are not internal "scratch" data, they need to be modeled and put into the record description. (And become links to proper (collection) resources. See e.g. id, created and updated the Document class for how record description data is used by the system itself.)

niklasl commented 7 years ago

Proposal: separate collection into datasource and marcCategorization (which is a function of the data). Questions:

niklasl commented 7 years ago

Ignore the notion of LDP containers until a real (client/import) need arise?

niklasl commented 7 years ago

Requirements:

Based on the varying requirements of stability, the differences need to be considered and the different needs have to depend on the right one (a logically fixed one, based on description content, is commonly needed for e.g. MARC exports, whereas a varying maintenance mechanisms (batch imports and deletions) require more dynamic lookup and changes (based on description management)).

(Also, a paged collection can be dynamically defined based on simple searches, as is done for e.g. items of an instance held by a specific library (find?itemOf.@id=<instance>&heldBy.@id=<library>). (Though, alas, this is done differently in the web view layer and the OAI-PMH view layer). This can include the above notions as well, if they are part of the actual data about the record (and not just internal strings in the lower data layer).)

We need to avoid letting our internal mechanisms proliferate when a few core notions can satisfy a wide range of related needs. The collection and changedIn properties can be formalized and managed as RDF data about a record (a "gbox"), and thus utilized for the current and conceivable scenarios ahead. By defining these as proper resources, we can also state provenance data such as date, responsibility and documentation of e.g. purpose.

mxtthias commented 7 years ago

Just a note to remind us that whatever solution we implement for collection, the concept occurs in a whole bunch of places, like the importers app in librisxl and Document in whelk-core.