Closed coret closed 1 year ago
ref: https://twitter.com/markuitheiloo/status/1554848093593640962 Mark suggested to add dcat:compressFormat. Schema.org mentions the following approach: For the case of a single file published after Zip compression, the convention of appending '+zip' to the [[encodingFormat]] can be used.
Suggest adopting the Schema.org approach and add the following to the specification part of schema:encodingFormat
:
When the distribution is compressed, the compression format (eg. zip, gzip, rar) should be added to the
schema:encodingFormat
(eq.text/turtle+gzip
).
Note: also include in example.
Doesn't that go against the 'rules' for media types? I believe you should be able to add specificity by inserting xxx+
directly after the /
, like application/ld+json
to note that something isn't just JSON. Adding +gzip
to the end contradicts this. I don't see this suggested at https://schema.org/encodingFormat either.
I would suggest to follow the DCAT2 spec and use a separate property to indicate compression format next to the file format.
The +zip
(though not +gzip
) is suggested in RDF 6839 - Additional Media Type Structured Syntax Suffixes.
Some examples to show possible solutions.
# format is correct, this is a gzip, but this is the envelop, we're interested in the nt part
[] a schema:DataDownload
schema:contentUrl: "https://www.openarch.nl/exports/nt/files/gld-20220726.nt.gz" ;
schema:encodingFormat":"application/gzip" .
# is this valid? maybay hard for machine to understand...
[] a schema:DataDownload
schema:contentUrl: "https://www.openarch.nl/exports/nt/files/gld-20220726.nt.gz" ;
schema:encodingFormat": ["application/gzip", "application/gzip" ] .
# example based on the +gzip addition
[] a schema:DataDownload
schema:contentUrl: "https://www.openarch.nl/exports/nt/files/gld-20220726.nt.gz" ;
schema:encodingFormat":"application/n-triples+gzip" .
# example of using a dcat property (I'm not a fan of mixing schema.org/Dataset and DCAT)
[] a schema:DataDownload
schema:contentUrl: "https://www.openarch.nl/exports/nt/files/gld-20220726.nt.gz" ;
schema:encodingFormat":"application/n-triples" ;
dcat:compressFormat: "application/gzip" .
I stand corrected! The RFC indeed specifies such suffixes and the associated Structured Syntax Suffixes registry lists +gzip
too.
The third example would be the clearest (now that I know about the standards). The only question I have left is how to encode (g)zipped JSON-LD files? application/ld+json+gzip
?
ref: https://twitter.com/markuitheiloo/status/1554838166174965761
When a distribution is gzipped, the
schema:encodingFormat
one could opt for isapplication/gzip
. But this obfuscated the real content-type. HTTP responses could be gzipped when client and server can handle this, without the need for theapplication/gzip
response type. We advise to use a content-type described the contents of the (compressed) file, such astext/turtle
,application/rdf+xml
, etc.Todo: add advise to requirements.