Open satra opened 2 years ago
I don't recall how we got there but we've used application/vnd+zarr
in the past.
Hi all - Unidata and the netCDF community is working on registering the application/netcdf
media type with IANA (see netCDF GH Issue 42). Here are a few notes on the registration process in case it is useful.
The process for registering a media type with IANA (defined in RFC 6838) has an unregistered namespace that "may be used for [media] types intended exclusively for use in private, local environments". The sub-type in the unregistered namespace/tree is prefixed with a “x.”, which replaces the older “x-” prefix.
The vender tree/namespace (prefixed with “vnd.”) is used for "media types associated with publicly available products". A suffix starting with “+” has a special meaning in IANA media type names. So, application/vnd.zarr
would fit the IANA model better than application/vnd+zarr
. Vendor tree media types need to be registered, but registration and review is light weight compared to the standards tree.
The standards tree (no prefix) is intended for “[media] types of general interest to the Internet community”. Media types registered in the standard tree must either be:
Registration on the full standards tree registry can take some time and effort. However, there is a provisional registration process available to facilitate prototyping and testing. The main hurdle for provisional registration is getting recognized as a “standards-related organization”. There are a number of standards and steering committees that are recognized as such. So, if Zarr decides to register on the standards tree, the Zarr Steering Committee might be the entity to get recognized.
This is as far as we’ve gotten for netCDF (application/netcdf
is listed on the provisional standard media type registry). So I don’t yet know the details of the review part of the full registration process.
@satra, for which files are you thinking of adding a mimetype? The fact that there are multiple makes this an interesting problem. e.g. if someone downloads a chunk and learns that it's "application/zarr" or whatever, what can they do with that without the rest of the fileset?
I don't recall how we got there but we've used
application/vnd+zarr
in the past.
@jhamman, you use this for each .zgroup, .zarray and .zattrs file? Conceivably these could also have a prominent "json" in the mimetype.
@jhamman, you use this for each .zgroup, .zarray and .zattrs file? Conceivably these could also have a prominent "json" in the mimetype.
So, we're using application/vnd+zarr
as the asset media type in the STAC context where an asset is represented as a path that points to a directory that contains a .zgroup
. We are not using the media types to represent the types of metadata or data objects within a zarr dataset.
for which files are you thinking of adding a mimetype?
@joshmoore - same as @jhamman . in our archive we are using nesteddirectorystore hosted on s3 as an asset. only the top level path (e.g., /path/to/somename.ngff
) in our database returns this mime-type within the metadata record, not the individual files underneath. we left our implementation for now with application/x-zarr
with the possibility of converging on whatever consensus emerges.
I just saw on a webinar from @bilts that NASA Harmony is using the mime type application/x-zarr
for Zarr assets.
Quote: A media type consists of a type and a subtype, which is further structured into a tree. A media type can optionally define a suffix and parameters:
type "/" [tree "."] subtype ["+" suffix]* [";" parameter]
Excerpts from a partial read of https://www.rfc-editor.org/rfc/rfc6838.html:
Based on these, my general thoughts are:
x[.-]
if possible.vnd.
since "industry consortia as well as non-commercial entities that do not qualify as recognized standards-related organizations can quite appropriately register media types in the vendor tree." but I think we could go for one of the other trees.application/zarr
certainly seems to be a natural fit especially since it's unlikely that too much can be done with the entity without the proper application, but+zarr
so that the main intent of the entity could be expressed with another mimetype, image+zarr
or application/zip+zarr
. The document for that is Structured Syntax Suffixes. Another current example is +sqlite
, which is defined to match application/vnd.sqlite3
.ping @yarikoptic
Is there any precedent for using mime types to refer to directory trees as opposed to individual files?
there have been several efforts : https://www.w3.org/2002/12/cal/rfc2425.html
and various vendor specific things including directories on android: vnd.android.cursor.dir
but nothing looking at the type of directory based stores that we are considering here.
Is there any precedent for using mime types to refer to directory trees as opposed to individual files?
FWIW I thought to check what http://github.com/file/file (libmagic) thinks -- looking at source and running (on linux) I think all directories are just inode/directory
and I don't even see that one among iana.../...media-types.xhtml.
- I could also see getting behind use of
+zarr
so that the main intent of the entity could be expressed with another mimetype,image+zarr
orapplication/zip+zarr
. The document for that is Structured Syntax Suffixes. Another current example is+sqlite
, which is defined to matchapplication/vnd.sqlite3
.
I wonder if it shouldn't be the other way around, i.e. have /zarr
and then possibly the +suffix
(e.g., +zip
assuming that +directory
is like a default.) rfc6838 ref on suffixes
I wonder if it shouldn't be the other way around, i.e. have
/zarr
and then possibly the+suffix
(e.g.,+zip
assuming that+directory
is like a default.) rfc6838 ref on suffixes
:+1: I could see that. Though I think the +zarr
as with +sqlite3
or +zip
could still be useful even if we want to target application/[vnd.]zarr
for most cases. Though perhaps the fact that only one suffix is intended could come back to bite us.
@satra It appears that both of those examples, https://www.w3.org/2002/12/cal/rfc2425.html proposing a text/directory
mime type, and vnd.android.cursor.dir
, logically represent some sort of collection of items, but are in fact still represented as a single file or byte stream.
Note: application/zip+zarr
would correspond to a single file (the zip file) so there is no issue there.
I can see the benefit of using a mime type if you have an existing database where things are identified by mime types. But my understanding is that so far mime types have been limited to identifying the format of a single file / byte stream. We may want to be careful in using mime types outside of their normal scope --- and perhaps at least see if this is something that has been done before.
we are trying to include some type information in our jsonld descriptors of a zarr asset. i could not find a search response to a mime type for zarr. would
application/x-zarr
be appropriate?