mime-types / ruby-mime-types

Ruby MIME type registry library
Other
322 stars 122 forks source link

Feature request: support tree, suffix, and parameters (RFC 6838/6839 and others) #67

Open maxlinc opened 10 years ago

maxlinc commented 10 years ago

mime-types was "built to conform to the MIME types of RFCs 2045 and 2231". RFC 2045 is itself composed of many other RFCs, some of which have been obsoleted or updated. For example, it refers to RFC 2048 - Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures (which defined the vendor tree). That RFC was obsoleted by RFC 4288 and 4289. RFC 4288 in turn was obsoleted by RFC 6838.

In short, too many RFCs to keep track of, but Wikipedia's summary is pretty good.

These newer RFCs have introduced or standardized three important concepts - tree, suffix, and parameters. The structure of a mime-type name is: top-level type name / [ tree. ] subtype name [ +suffix ] [ ; parameters ]

These are concepts are commonly used by modern applications. Parameters are often used to define charsets or codecs for videos:

text/plain; charset=utf-8
video/mp4; codecs="avc1.640028"

Suffix is used to indicate an underlying structure or container format. The following suffixes are registered: +xml, +json, +ber, +der, +fastinfoset, +wbxml, +zip, and +cbor. Some examples include SVG images or Atom feeds, which are their own registered format but use XML as the underlying structure:

image/svg+xml
application/atom+xml

The vendor tree is commonly used by RESTful services, especially in combination with a +suffix. GitHub APIs, for example, return the following mime-types:

application/json
application/vnd.github+json
application/vnd.github.v3+json
application/vnd.github.v3.raw+json
application/vnd.github.v3.text+json
application/vnd.github.v3.html+json
application/vnd.github.v3.full+json
application/vnd.github.v3.diff
application/vnd.github.v3.patch

It would be nice if mime-types supported these concepts. I'm not sending a PR yet because exactly what "support" looks like probably requires some discussion. I think the tree is simple and would just be an (optional) part, like sub_type or media_type. The suffix concept may also be similar, though I think it'd be useful if it was used for inheritable default values (e.g. if MIME::Types['application/vnd.github+json+json'] returned an unregistered type based on application/json, rather than returning nothing). The parameters concept is probably the one that needs the most thought, because right now they're ignored during lookup but no while creating types:

MIME::Type.new('text/plain; charset=utf-8') == MIME::Type.new('text/plain; charset=ascii')
# => false
MIME::Types['text/plain; charset=utf-8'] == MIME::Types['text/plain; charset=ascii']
# => true
halostatue commented 10 years ago

I'll need to think about this some, and I agree with the concepts.

maxlinc commented 10 years ago

Perfect. I just wanted to get someone thinking about it.

I don't have many specific use-cases in mind, mostly because I'm not sure how mime-types is used by most projects. I do have one use-case in mind, though: selecting a parser, serializer or formatter for a MIME::Type.

Grape, for example, uses the mime type to select a serializer or a parser. A similar selection mechanism could be used in middleware for things like formatting (e.g. pretty-printing JSON) or linting (e.g. JSONLint).

(Note: I started thinking about this while working on a code generator from http://swagger.io/ to Grape, not on Grape itself)

Typically it isn't necessary to distinguish between "application/vnd.github.v3.text+json" and "application/vnd.github.v3.html+json" for this use-case. The service itself may need to know the difference (to return a different object or use a different query), but in most cases it's only necessary to know the "underlying structure or container format" - json - so you can use appropriate methods like to_json or JSON.parse. In that case it's enough to know that to know that either the media_type or suffix is "json".

halostatue commented 8 years ago

Deferring this to post-3.0; I think we have the features we need to be able to support this, but I don’t know what the API is going to look like.

bf4 commented 8 years ago

@halostatue (hi!) Came across this issue looking for suffix support for examples related to https://github.com/json-api/json-api/issues/1020 :)

halostatue commented 8 years ago

I use HAL personally, but I still want to support this in the future.

ioquatix commented 8 years ago

If you are interested, there is a media type parser here: https://github.com/ioquatix/http-accept/blob/master/lib/http/accept/media_types.rb which conforms to rfc7231

I don't know if this is really appropriate for this library. Let's face it, there are hundreds of ways to compare content types and media ranges. It's not something that can be easily standardised in a way that works for everyone. IT might be best just to provide a library (or use the one above for example) to do the parsing and implement application specific logic where it makes sense.

Nakilon commented 2 years ago

there is a media type parser here

That's exactly what I was looking for. I want to figure out the input html charset to then parse it again but with proper encoding.

require "http/accept"
encoding = HTTP::Accept::MediaTypes.parse(
  Oga.parse_html(input.encode "utf-8", undef: :replace).
      at_css("[http-equiv='Content-Type']")["content"]
)[0].parameters.fetch("charset")
# => "windows-1251"
input.force_encoding Encoding.find encoding
html = Oga.parse_html input.encode "utf-8"
ioquatix commented 2 years ago

Awesome! That code was written a long time ago, I'm glad it's still useful.