jkunze / bagitspec

31 stars 11 forks source link

Missing media type (MIME type) for BagIt #22

Open paulmillar opened 1 year ago

paulmillar commented 1 year ago

Media type (also known as a MIME type) is a widely used system for labelling the format of data. There is a central database of Media/MIME types maintained by Internet Assigned Numbers Authority (IANA). More information about Media/MIME types is available at the Media Type wikipedia entry.

Currently, there is no media type for BagIt.

This lack of a media type can causes problems, particularly in situations where a file might (or might not) be a BagIt file. As a concrete example, DataCite is updating their metadata schema so that it supports accessing the files in a dataset. One possibility is to provide the data directly (e.g., as a zip file) another possibility is to describe how to fetch the data using an empty BagIt file (one with an empty /data directory and details on how to fetch the data via the fetch.txt file). The DataCite metadata scheme supports recording the Media Type of the file; however, in both cases, the file would have the media type application/zip. A client may wish to download the data if it is a BagIt file (for example, to obtain metadata), but is currently unable to determine whether the linked zip file is a BagIt file.

Media type labels are somewhat sophisticated and include a few features that may prove useful for BagIt.

One feature of media types is the availability of suffixes. This allows a media type to describe both the file format and the underlying format; e.g., application/bagit+zip could describe a BagIt file that is based on the zip archive format. This allows clients that do not support BagIt but that do support zip (application/zip) archives to process the file; for example, to check the integrity of the files in the archive or to scan the file for viruses.

Another feature of media types is parameters. Parameters allows a media type to include metadata about the file. One common parameter is profile. This provides a flexible way to be more specific about the nature of the file without creating many new media types. There is already a profile language for BagIt: BagIt-profiles.

Altogether, this is an example of my suggestion for a BagIt Media Type:

application/bagit+zip;profile=https://example.org/bagit/my-profile

I would advocate that there is a discussion on what should the BagIt media type look like. Once a consensus is established, the corresponding media type should then be registered with IANA, so that it may be used to describe BagIt files.