Open ajnelson-nist opened 1 year ago
The design intent for UCO was not for a single ObservableObject to conflate File and URL together.
The intended way to describe a "downloadable file" is to convey a File object, a separate URL object and a Relationship object with source=
Separating the File and the URL and associating them with a relationship avoids the complexities explicitly or implicitly identified in the writeup above. It also yields a much more effective graph where the same file can be downloadable from multiple URLs, the download URL may change over time, etc.
If a formal disjoint axiom between File and URL classes is believed necessary to avoid anyone getting confused and conflating them onto the same object for this use case then I would support such an action.
It is pointed out above that the observable:dataPayloadReferenceURL is missing a definition. This looks like an unfortunately oversight. The intended purpose for this property is to provide a link to where the actual content of a ContentDataFacet (on a ContentData, File, Memory, etc object) could be stored. This is if the content is desired to be available but is too large to share encoded in the observable:dataPayload property or not desired to directly express in the UCO object.
Disclaimer
Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.
Question
In UCO, what
rdf:type
s should be assigned to anObservableObject
that is a downloadable file?This need arises from at least two directions:
Dataset distribution: While posting reference data, download sites will often host images for delivery over HTTP(S). At least one RDF-based model, DCAT, encourages storing a reference to the downloadable URL as a
rdfs:Resource
IRI, and treating that IRI as a file---just a file that hasn't been downloaded yet. See propertydcat:downloadURL
.Software supply chain: In Software Supply Chain representation, frequently metadata about software packages will include a download URL and hashes corresponding to the download URL. See for example this metadata manifest for
case-utils
' recent release, retrieved and manually trimmed from this API endpoint:It seems this is something UCO is designed to be able to represent, but the classes and properties that look like the best candidates for doing so have not received significant exercising. Some of them are not documented, and some have fairly lax constraints leftover from the prototyping days pre-dating the
ObservableObject
subclass hierarchy.First, because of the representation suggested by DCAT, which is not distinct to DCAT, I am specifically interested in how to represent the
url
resource in that JSON dictionary as an IRI, without being wholly reliant on duck-typing:(As an aside, this also disregards the UCO guidance that IRIs end with UUIDs. In at least the DCAT application, this is just going to have to be the case. We could consider this an exercise of UCO as an enricher of existing knowledge bases. The implementation for UCO Issue 430, where the UUID requirement was introduced, specifically allowed for this use case.)
From CASE/UCO duck-typing, I know I want this to have a
URLFacet
,FileFacet
, andContentDataFacet
. So, here is how I would addFacet
s to say that thatUcoObject
behaves like aobservable:File
,observable:URL
, andobservable:ContentData
. (This next block, and code blocks further in this post, should be read as additive on top of what is written in preceding blocks.)Prior practice within UCO suggests that the
rdf:type
of theUcoObject
should be at leastuco-observable:ObservableObject
, though nothing in the classes and properties used above is encoded to require this:Now, it's not clear whether these types would also be correct to assert. (As noted in prior Issues, UCO's usage of duck typing is not inferentially bidirectional. Being a
File
implies having aFileFacet
. Behaving like aFile
, i.e. having aFileFacet
, does not imply that theUcoObject
is aFile
. An Issue coming soon will propose encoding this.)By UCO's current encoding, those three OWL Classes would be fine to use concurrently on the same node, because UCO does not define anything about them as disjoint from one another. (UCO's only currently disjointedness axiom separates
uco-core:UcoObject
fromuco-core:UcoInherentCharacterizationThing
, the superclass ofuco-core:Facet
.) But, is this a practice UCO should encourage users to adopt, or discourage users from adopting?There are at least the following non-trivialities with using the three classes concurrently.
Confusion with rdfs:Resource
One point that makes handling this not obvious is that UCO's
observable:URL
class in this case becomes a bit confused with the RDF foundational classrdfs:Resource
.Potential decision on inherence of files
Another point is that it is possible UCO could go the route of defining
File
as inherent toFileSystem
. (This would take a healthy amount of discussion, as there would be significant pros and cons on this. A proposal on this isn't coming today.) If this decision were adopted, could aURL
also be aFile
? Or do we need aRelationship
defined to represent that a URL is (or was at some time), say, a projection of, or access channel to, a file on a file system, or an object in an S3 bucket (as done by Digital Corpora)?Difference in occurrence of ContentData and URL
Another point is that
observable:ContentData
doesn't have any relationship (that is, subclass-based or predicate-based) with otherObservableObject
s encoded in the ontology. Does this snippet of the JSON above ...... describe the
URL
? Or, does it describe some more abstract content-signature pattern? If the latter, how does this relate to theURL
? Would aFile
relate the same way?(The UCO Pattern namespace has not, to date, been demonstrated in any public CASE or UCO examples.)
Payload reference URL
Separately but relatedly, there is a property
observable:dataPayloadReferenceURL
. It lacks a definition (rdfs:comment
) in the ontology, and only constraints its range toobservable:ObservableObject
. It has not been demonstrated on the CASE website. It has been demonstrated a few times in CASE-Examples:Oresteia.json
, showing where to download an attachment (which is only typed asobservable:ContentData
) from an external website.message.json
, showing where to download attachments (each of which is only typed asobservable:ContentData
) from an external website.network_connection.json
, showing where a file is stored in a locally-mounted file system. Though, this specific example portion has an inlined design question.CASE-Corpora currently does not demonstrate
observable:dataPayloadReferenceURL
, but I'm considering adding a shape (scoped only to CASE-Corpora for now) that tailors its usage for "Downloadable files," a class where each member of the class reflexively treats its own IRI as its URLfullValue
and its content-datadataPayloadReferenceURL
. Within CASE-Corpora, DCAT influences this decision. This Issue is filed in part to affirm or dissuade that class design.Summary
UCO seems like it has all the pieces available to express that "I know of large file X, durably archived at this URL, and it has these hashes." But demonstration is needed to test UCO's class design, stepping past the relaxed model permitted by duck typing.
The above lay out questions that I believe will be essential in UCO's efforts towards software supply chain analysis and certain steps in cross-organization-boundary data sharing. I look forward to the opportunity to discuss these approaches and clarify these class and property interactions.