Open niklasl opened 7 years ago
Proposal: separate collection into datasource
and marcCategorization
(which is a function of the data). Questions:
changedIn
now same as datasource
?datasource
from one version to another (and if so, how)?Ignore the notion of LDP containers until a real (client/import) need arise?
Requirements:
changedIn/datasource
can change over time (when data sources change from import processes to editing through interfaces, and sometime "back" if resources are to be batch-processed).
Examples of sources: the current Voyager MARC subsets (auth, bib and hold), the 7 or so sets of resources in the definitions repository, import batches or other systems (bibdb, smdb?).
There can only be one datasource at a time for a record.
collection
or similar is to be logical and generally fixed, based on the (base) type and/or categorization of the described resources within.
There is a common correlation between that and "source", and sometimes their "nature" is bound to their source (e.g. "LCSH", "SAO" and "SAB"). It can still be computed (as in derived from e.g. type or origin), and may be more of a tag mechanism for systems to simplify processing of richer data. (Cf. LDP indirect containers)
It is conceivable that a record belongs to more than one collection.
Based on the varying requirements of stability, the differences need to be considered and the different needs have to depend on the right one (a logically fixed one, based on description content, is commonly needed for e.g. MARC exports, whereas a varying maintenance mechanisms (batch imports and deletions) require more dynamic lookup and changes (based on description management)).
(Also, a paged collection can be dynamically defined based on simple searches, as is done for e.g. items of an instance held by a specific library (find?itemOf.@id=<instance>&heldBy.@id=<library>
). (Though, alas, this is done differently in the web view layer and the OAI-PMH view layer). This can include the above notions as well, if they are part of the actual data about the record (and not just internal strings in the lower data layer).)
We need to avoid letting our internal mechanisms proliferate when a few core notions can satisfy a wide range of related needs. The collection
and changedIn
properties can be formalized and managed as RDF data about a record (a "gbox"), and thus utilized for the current and conceivable scenarios ahead. By defining these as proper resources, we can also state provenance data such as date, responsibility and documentation of e.g. purpose.
Just a note to remind us that whatever solution we implement for collection
, the concept occurs in a whole bunch of places, like the importers
app in librisxl
and Document
in whelk-core
.
We need to nail down the intended usage (i.e. "meaning") of
collection
,changedIn
andchangedBy
inDocument
& LDDB. Do they represent original datasource and/or an LDP container, or something entirely internal? (Also, can thechanged*
and dependent logic be removed, if the Voyager two-way sync mechanism is to be removed?)If they are not internal "scratch" data, they need to be modeled and put into the record description. (And become links to proper (collection) resources. See e.g.
id
,created
andupdated
theDocument
class for how record description data is used by the system itself.)