solid / data-interoperability-panel

Repository for the Solid Data Interoperability Panel
MIT License
51 stars 19 forks source link

Metadata ontologies for containers? #89

Open bblfish opened 3 years ago

bblfish commented 3 years ago

What ontologies should be used when returning a GET on a LDP Container? Currently there are a number of ontologies used, but they are not very satisfactory. Here is an example served by NSS

curl -H 'Accept: text/turtle' https://csarven.ca/archives/linked-research-decentralised-web/inbox/
@prefix : <#>.
@prefix inbox: <>.
@prefix ldp: <http://www.w3.org/ns/ldp#>.
@prefix terms: <http://purl.org/dc/terms/>.
@prefix XML: <http://www.w3.org/2001/XMLSchema#>.
@prefix st: <http://www.w3.org/ns/posix/stat#>.
@prefix tur: <http://www.w3.org/ns/iana/media-types/text/turtle#>.

inbox:
    a ldp:BasicContainer, ldp:Container;
    terms:modified "2020-11-09T13:36:42Z"^^XML:dateTime;
    ldp:contains inbox:87bc9a28-9f94-4b1b-a4b9-503899795f6e;
    st:mtime 1604929002.475;
    st:size 4096.
inbox:87bc9a28-9f94-4b1b-a4b9-503899795f6e
    a tur:Resource, ldp:Resource;
    terms:modified "2019-07-29T14:09:06Z"^^XML:dateTime;
    st:mtime 1564409346.323;
    st:size 1353.

The stat ontology

Makes a lot of sense when serving files from the file system. But over half the attributes there are useless for the web: stat:dev, stat:gid, stat:ino, stat:mode, stat:nlink, stat:rdev are all too specific to an OS instance, and don't have much value outside of it. The only one that specifies the unit is the size in bytes, which is good to have. It is not clear what ctime is ("time of last status change"?). In java we have modified and created times we can access. Those would be useful to publish. But in what units? The mtime above has a decimal place. Why? Can the ontology be made more explicit, in which case it would be easier to know what the standard should be?

Dublin Core

The terms:modified has an xsd:dateTime in the above curl result, but Dublin Core does not specify that the value of terms:modified should be in that format. It has a very flexible string format to allow a lot of variation. It has a section on date, but that does not mentione xsd. (And I could not find documentation on that). It has a created time and a modified time, which would be useful. It has a "sizeOrDuration" attribute, but again the format is not clear to me as an implementor.

IT would be useful to have small specs for things that address concerns like this, after concertation with developers and the community.

Note

Just to get my milestone done, I am producing this for the moment in Scala3

def containsAsTurtle(path: Path, att: BasicFileAttributes): String =  
   val filename = path.getFileName.toString + { if att.isDirectory then "/" else "" }
   s"""<> ldp:contains <$filename> .
    |    <$filename> stat:size ${att.size};
    |        stat:mtime ${att.lastModifiedTime().toMillis};
    |        stat:ctime ${att.creationTime().toMillis} .
    |""".stripMargin    

But I don't know if the units are correct, or how much traction this has with other servers.

csarven commented 3 years ago

Putting aside security consideration / discussion here.. there is discussion/PR in solid/specification on this already.

Also reminder: https://github.com/solid/vocab


Note LDP BP:

http://purl.org/dc/terms/date http://www.w3.org/2000/01/rdf-schema#range http://www.w3.org/2000/01/rdf-schema#Literal . http://purl.org/dc/terms/date http://www.w3.org/2000/01/rdf-schema#subPropertyOf http://purl.org/dc/elements/1.1/date .

http://purl.org/dc/elements/1.1/date http://purl.org/dc/terms/description "Date may be used to express temporal information at any level of granularity. Recommended practice is to express the date, date/time, or period of time according to ISO 8601-1 [ISO 8601-1] or a published profile of the ISO standard, such as the W3C Note on Date and Time Formats [W3CDTF] or the Extended Date/Time Format Specification [EDTF]. If the full date is unknown, month and year (YYYY-MM) or just year (YYYY) may be used. Date ranges may be specified using ISO 8601 period of time specification in which start and end dates are separated by a '/' (slash) character. Either the start or end date may be missing."@en .


The mtime above has a decimal place. Why

milliseconds. That particular resource is served using NSS which happens to use milliseconds precision for POSIX mtime ( https://github.com/solid/node-solid-server/blob/bea9491007e293a9c62ac3d8ba1faaaf95fd7b17/lib/ldp-container.js#L113 -> https://github.com/solid/node-solid-server/blob/bea9491007e293a9c62ac3d8ba1faaaf95fd7b17/lib/ldp.js#L81 -> https://nodejs.org/api/fs.html#fs_fs_stat_path_options_callback )

From what I can tell https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap04.html#tag_21_04_11

POSIX.1-2017 does not impose any requirement on the accuracy of the execution time; it instead specifies that the measurement mechanism and its precision are implementation-defined.

bblfish commented 3 years ago

A spec would need to distinguish Units and precision. The Posix dates I guess are expressed as a unit of milliseconds since the Unix Epoch, and it looks like these can be either xsd integers or xsd floats for more precision (such as nanoseconds). That has the advantage of being simple to implement on the server and on the client. Still there are two xsd types in play. The POSIX ontology is on W3C servers, so the information could be updated there for clarity, and a document could explain when those are useful to publish by Solid servers.

The dc terms is expressed as a coproduct of a number of units (year, year-month, year-month-day, ...), each unit coming with its own precision, all in the format of a date since the Christian Epoch. Those are useful for describing documents that are much older, and so presumably refer to the creation time of the document as a non LDP resource. That requires a lot more parsing and tooling on the client, and the ontology of what is described is quite different. The documents on W3C seem to be pre XSD, and so it is a question if xsd time formats should be used at all and if any which ones exactly, as that could help with RDF tooling.

Then we would need to get some feedback by App and server developers with studies of which are actually used by various systems, and if we can get an agreement on what to use for different use cases, so that we can have convergence.

TallTed commented 3 years ago

What ontologies should be used when returning a GET on a LDP Container?

Whichever the implementer/administrator/user determines are relevant in providing a description of that Container.

The LDP WG did not restrict the ontologies that should or could be used, because there was no way for us to know all of the possible usage to which LDP would be put, nor all of the available ontologies which might be relevant.

Relations like skos:narrower, skos:broader, owl:sameAs, owl:equivalentClass, owl:equivalentProperty should be put to use as/when appropriate in the wild, such that convergence is not inflexibly forced (and all but guaranteed to be incorrect in some if not many cases) but is flexibly allowed (and so all but guaranteed to be correct in each deployment, as the deployer/user can ignore/omit any such relation which is not true for them).

bblfish commented 3 years ago

What ontologies should be used when returning a GET on a LDP Container?

Whichever the implementer/administrator/user determines are relevant in providing a description of that Container.

That still leaves a lot of place where documentation can help. See for example my points in the response above. Also as an implementor I am not just publishing data for myself. I am publishing it, in order for it to be useable by apps. So we would like to have convergence. We would like to publish so that apps can use consistently the data that is published.

The LDP WG did not restrict the ontologies that should or could be used, because there was no way for us to know all of the possible usage to which LDP would be put, nor all of the available ontologies which might be relevant.

That is true. It was also 7 years ago, and the LDP group did not have a conception of hyper-apps. Without access control, all apps could really be only written for one server, so that meant those building the vocabularies were also those writing the consumers of them. On the Solid web that is no longer the case. So coordination would be helpful.

Relations like skos:narrower, skos:broader, owl:sameAs, owl:equivalentClass, owl:equivalentProperty should be put to use as/when appropriate in the wild, such that convergence is not inflexibly forced (and all but guaranteed to be incorrect in some if not many cases) but is flexibly allowed (and so all but guaranteed to be correct in each deployment, as the deployer/user can ignore/omit any such relation which is not true for them).

Agree. One should not require things, but one could have better defined ontologies, or groups come to a consensus on ontologies well adapted for use cases.

In this case I am looking for convergence on basic metadata for LDPCs.

This could be done empirically, by checking what different implementations are doing, looking at basic application needs, and seeing how we can get some agreement. This could lead to improved ontology definitions, and also perhaps to the community discovering better adapted ones.