Closed mqudsi closed 5 years ago
So we'd double every mime type by having a z equivalent. That doesn't sound useful.
So we'd double every mime type by having a z equivalent. That doesn't sound useful.
Thanks for your dismissive reply. I just explained precisely why it is not only useful but needed. SVG is unique in that as a plain text format a compressed version of the same content was adopted (standardized?) as a first-tier file format, directly generated by various editors and encoders, without any effort to mask the compression type.
Take for example PNG, which is also losslessly compressed (LZW) but that compression is internal to the file format, e.g. the magic bits for PNG files say "PNG" and not "LZW".
So, yes, we'd double every mime type by having a z equivalent but only for MIME types that are shared by two entirely different file types that shouldn't have shared a single MIME in the first place; I think that's what you meant to say.
I don't think we'd implement this in Firefox.
@mqudsi As you said, there should be two headers:
The former is the MIME type and the latter would be the GZIP compression. It is unfortunate that Apache (and maybe others?) only relies on the extension to MIME type mapping but this seems to be an implementation detail.
In general, as you noted as well, the compression happens dynamically nowadays and there is no need to even generate .svgz files anymore. That didn't used to be the case in the past.
IMO .svgz should be seen as a legacy and not exercized anymore. It would take years to get clients and backends implementing it and the confusion might be even bigger. It doesn't seem worth the cost to potentially avoid the double compression.
The SVG Working Group just discussed A separate MIME type for svgz files is needed
.
In the early days of the Web, the distinction between the type and the encoding was not clear. So for example a MIME type was registered for zip archives, and another one for gzipped files (application/gzip). This was a bad design.
The decision (from memory, around 1997-8) to use a single Internet Media Type (MIME type) for SVG and to use Content-Encoding to indicate the presence of compression (whether on-the-fly compression or static compression generated by some authoring tool) was thus result of experience in th eIETF and W3C, and the specific registration benefited from feedback from the IETF, and remains a sound architecture to this day.
In the early days, most servers did not do on the fly compression. There was a need for authoring tools to be able to emit the compressed form and for content creators who did not have control over server configuration to get the correct result. Which is why .svgz
was standardized.
Nowadays, on the fly encoding is common and indeed there are several types (such as Brotli encoding, which does a better job on SVG than gzip). So nowadays on many servers the performance gain can be realized by just dropping a .svg
file onto the server.
But using .svgz
files is a long established practice, and works well.
Your suggested change would simply have the effect that there would be a new Internet Media type with no support, so people would not use it as the images would not be displayed.
As @longsonr said, a complete duplication of Internet Media types is a very poor solution.
@mqudsi said:
More practically speaking, a browser served a svgz file as generated by the myriad of SVG editors/compressors/optimizers/etc without a gzip Content-Encoding will not be able to render the image due to broken encoding
Correct. And that would indicate an error in the filetype mapping on that server. .svgz
means both Content-Type and Content-Encoding should be set. And I know that Apache can be set to do that (indeed, I thought is was the default mapping out of the box). So this is not a problem that seems to occur much in practice, and your suggested solution would not solve it. Instead, just ensure the server is configured correctly.
The issue isn't that Apache can't set content type and content encoding, the issue is that some web servers (I know for a fact that nginx is one such server) use content type to determine whether a file should be dynamically compressed or not, then it is served with the detected mime type plus the correct content encoding (and this part is working correctly), e.g.
gzip_types text/html application/javascript application/json application/xml+rss image/bmp image/svg+xml text/css text/javascript text/plain text/xml;
This directive says "when a resource with a mime type from this list is requested, apply a gzip transform to it, and serve the compressed content with the Content-Encoding: gzip
header"
A plain Jane .svg
file has a content type image/svg+xml
which appears in the example gzip_types
list above, so it is correctly compressed (significantly bringing down its size, given the byte-level duplication in text -- and particularly in xml -- files).
The problem is that this directive cannot distinguish between a request for example.com/file.svg
and example.com/file.svgz
because both have the same content type, so both will be gzipped on the fly, which would be OK if there were a separate content type the latter could be served with, as the dynamically compressed .svg will have the same Content-Type
header as the dynamically compressed .svgz
file (Content-Encoding: gzip
). The end result is that the client in both cases receives a response with
...
Content-Encoding: gzip
Content-Type: image/svg+xml
...
and so has no way of knowing that the resulting file still needs to be decompressed again to actually be a valid SVG (and not SVGZ) file.
The server should either serve a .svgz
file as-is with Content-Encoding: gzip
and Content-Type: image/svg+xml
or it may (pointlessly) recompress it on-the-fly and serve it with Content-Encoding: gzip
but then it needs to indicate that the response is not a text document but rather still gzip-encoded.
It wouldn't matter if applications with svg support could dynamically distinguish between a svgz and a svg file without having the correct extension (either by having a shared header that indicates the actual encoding, but that would mandate changes to the file format which is obviously never going to happen) or by simply falling back to trying to gzip deflate then attempt to once again decode as svg+xml if/when the initial decode-as-plain-svg step fails, but for example (@longsonr) Firefox won't decode an at-rest svgz file as svg as it doesn't attempt to decode it as gzip.
Ultimately, the problem is that .svgz files do not have magic header bits to tell whatever client application is opening them that they are gzip-compressed svg files, meaning that without an outwardly-visible indicator that is correctly preserved across transformations, they have no idea that they should decode it first. On Windows where there is no internal concept of mime types, the extension is used to make that distinction. In the web world, extensions have zero significance and the content-type header is used alone to make that decision, and unfortunately it fails in this case.
Note that I can configure nginx (via the mime.types
file) to map requests to .svgz
files to a different mime type than image/svg+xml
which would stop it from dynamically compressing .svgz
files but still have true .svg
files compressed on-the-fly, but then the response will have Content-Type: foo
instead of Content-Type: image/svg+xml
because the same content type that is used to determine dynamic compression is also served to the client.
Personally, I don't really care as I'm fully in control of what types we serve. But please understand that this isn't a situation shared by any other file type and so comparisons with gzipped versions of other media types are not appropriate. .svgz
isn't me (or whoever else falls prey to this) deciding on their own to gzip a regular svg file and then give it a .svgz
extension rather than a .svg.gz
extension, it's a regular person using a regular application option to save an SVG document into a format + extension that's been around for a long time, with no indication that this would cause problems in certain deployment scenarios.
It's also important to note that there are almost no drawbacks for adding a mime type here. Applications that ignore the mime type and use only the extension to determine how a file is opened will continue to do so. Applications that rely exclusively on the mime type will continue to fail open the file in this particular case (as it has never been possible to decode a svgz file based purely off the mime type without the content-encoding as well).
I'm skeptical that this confusion is solvable at this point. Any change introduces compatibility issues.
But I agree that .svgz
files on the web are often more pain then they are worth. Even if you serve them correctly, none of the browsers I've tested will compress it again on saving, creating a mismatch with the file extension when you try to open the file.
I would be happy to add a warning to the spec, that
.svgz
over HTTP requires correct configuration and some web servers do not support the necessary configuration options..svg
files with the best server-enabled compression supported by the client, including the use of more recent compression methods (e.g., Brotli)..svg
file is very large), it may be possible to configure the server to recognize a .svg.gz
file extension as representing a pre-compressed SVG file. (E.g., for nginx this is supported with the gzip_static
directive). The website author would need to rename the .svgz
file to .svg.gz
before uploading.I'd also hope that requests from web developers might convince servers to add support, but I'm not sure if that will happen. I found a wontfix
nginx feature request to add support for .svgz
to the gzip_static
directive.
I can certainly live with that.
I found a wontfix nginx feature request to add support for .svgz to the gzip_static directive.
Which sounds bad, but in that bug they explain how to set the server up correctly:
location ~ \.svgz$ { add_header Content-Encoding gzip; }
so wontfix
here genuinely does mean that nginx is not actually broken (although their default mime types setup could certainly include this by default).
In terms of the SVG specification and Internet Media Types, though, there is nothing to fix here.
@mqudsi Can we get to a conclusion that a specification note like suggested by @AmeliaBR in https://github.com/w3c/svgwg/issues/701#issuecomment-505658002 would be a fair enough compromise? @AmeliaBR could you create a PR with the proposed change please?
@dirkschulze yup, that's fine :)
Y'all should just deprecate .svgz
. Seems like the best way to fix this that introduces no compatibility issues. The feature of "this is just a gzip containing this type of file", is fine but it doesn't need to be done on a per-file-spec basis.
If something like text/plain;compression=gzip
got added into MIME it would certainly capture the whole of that feature and might actually be a useful at times svgz
would be something like image/svg+xml;compression=gzip
but having a different file type with the same MIME type seems like the best thing is to stop doing that.
@tatarize I agree with you for a web context. But .svgz is very useful for local file system use, especially on systems that don't support file associations based on stacked file extensions like .svg.gz
Currently, it's impossible to use svgz
on data URIs, right?
Currently, it's impossible to use
svgz
on data URIs, right?
Well a data URI has two text formats, URI encoding and base64 encoding. zipping creates creates binary format data so there's that problem you'd have to address. If you did that you could signal the content type where you currently put base64.
This isn't what we're discussing here though.
@longsonr Yeah, I know that.
The problem is that, because of this issue, browsers can only recognize the binary data in a data:image/svg+xml;base64,...
as a malformed svg
, not svgz
(since there is no MIME type for svgz
or any parameter like ;compression=gzip
). So browsers won't parse the data, its impossible.
Chrom(ium):
This page contains the following errors:
error on line 1 at column 1: Encoding error
Below is a rendering of the page up to the first error.
Firefox:
XML Parsing Error: not well-formed
Location: data:image/svg+xml;base64,...
Line Number 1, Column 1:
Of course, you could argue that's a limitation of the data URI spec. However, this change would easily solve this.
Another possible solution would be if browsers were required to detect the gzip magic number 0x1F8B (control char + non-ascii, so not a valid xml?) and automatically interpret the file as svgz
even when served with a content-type for svg
.
I'm not saying it's a good solution, but it's possible, since there is no conflict, apparently.
Another possible solution would be if browsers were required to detect the gzip magic number 0x1F8B (control char + non-ascii, so not a valid xml?) and automatically interpret the file as
svgz
even when served with a content-type forsvg
. I'm not saying it's a good solution, but it's possible, since there is no conflict, apparently.
We won't be doing that.
Linux/Gnome/KDE .... etc command: $ xdg-mime query filetype test.svgz result: image/svg+xml-compressed
-compressed = more, more problem upload files. svgz, svgrar, svg7zip, svgnowzip, svgformat, svgcompact, svgmin ..etc - Bad Design.
So I agree that .svg
and .svgz
should have separate media types, the file formats themselves are completely different. One is gziped (binary data), the other is XML (text).
Consider a system which allows a user to upload files via HTTP POST or PUT, perhaps that system wants to do something special with XML and/or text documents.
When the file is uploaded, the Content Type in the HTTP request is set to image/svg
, but the server now has no idea if it is receiving an svg
or svgz
file.
The server would have to do extra file-type determination just for SVG, whereas for the majority of other formats, using the Media Type is sufficient. This seems very silly!
Linux/Gnome/KDE .... etc command: $ xdg-mime query filetype test.svgz result: image/svg+xml-compressed
-compressed = more, more problem upload files. svgz, svgrar, svg7zip, svgnowzip, svgformat, svgcompact, svgmin ..etc - Bad Design.
suppose plainsvg.svg
is a plain svg and gzippedsvg.svgz
is a gzipped svg then if you do:
$ mv gzippedsvg.svgz gzippedsvg.svg
$ xdg-mime query filetype gzippedsvg.svg
you get: image/svg+xml
and if you do:
$ mv plainsvg.svg plainsvg.svgz
$ xdg-mime query filetype plainsvg.svgz
you get: image/svg+xml-compressed
it seems that command use ( also ? ) the file extension... not ( only? ) the magic number! (better the file
command to determine file type?)
So ... we're just gonna ignore that data URIs work for every widely used file format except svgz? Shouldn't W3C/IETF come up with a way to solve that? Even if it means updating data URI spec? Doing a quick search, apparently it was expected/verified to work before (unless I understood it wrong?) in 2010 (in fact, without any changes, just by auto-detecting it): https://mailarchive.ietf.org/arch/msg/pkix/7XbZ6Ylg8-n-ACnu7TPz3PPFss0/
I don't know about current Opera, but there's no way to make it work, at least in Firefox. Unless HTML <img>
has some attribute I'm unaware of that would simulate a Content-Encoding: gzip
for data URIs, or something like that (though it still wouldn't work when opening the data URL in a new tab). I don't think it has.
Btw, I'm not advocating for any specific solution or spec change necessarily, it's just so awkward that something that should work for every file type with a MIME doesn't work in this case because of some technicality-deadlock-thing.
if browsers were required to detect the gzip magic number 0x1F8B
Perhaps the word "require" is too strong here, but what about "recommend"? Something along the lines of:
In contexts where a gzip encoding cannot be specified, it's recommended that user agents interpret files with MIME type
image/svg+xml
where the binary data starts with the gzip magic number0x1F8B
as an SVGZ file, as if it had been served with HTTP headerContent-Encoding: gzip
.
Or maybe even "allowed":
[...] user agents are allowed to interpret [...] as SVGZ [...]
IMO it would be the best compromise so that we could at least get this to eventually work in some way. What would be the impediment or downsides? Or maybe there's some other better way to do it? It would be unfortunate if this becomes a "wontfix" kinda thing.
SVGZ files are currently being used inside X.509 certificates for the logotype extension (OID: 1.3.6.1.5.5.7.1.12) for the BIMI standard. The currently used MIME type in those certificates is image/svg+xml
. This is rather misleading considering it's actually an svgz
in the case of BIMI. There may also be other uses (even with same OID, if not others) where that might not be the case. Not being able to instantly tell which is which does cause confusion and can cause mistakes.
While the web indeed has Content-Encoding: gzip
and nobody should do svgz + gzip instead of svg + gzip, this is not really an assumption that can made for other use-cases.
(In the end there are many formats that are compressed or use a compressed container and they have their own MIME type, for a very good reason, they're different formats rather than just .zip
, they carry different semantics and need different handling.)
Presently, both
.svg
and.svgz
files share a mime typeimage/svg+xml
. This is a problem because it means that no transport protocol/layer/application can correctly serve its equivalent of HTTP'sContent-Encoding
based off of the MIME type alone.For example, nginx (and Apache?) specify the type of files to be dynamically compressed for serving (the client's
Accept-Encoding
permitting) based off the MIME type, typically mapped from file extension in a separate (systemwide or application-specific) configuration file (in nginx's case,mime.types
).If
mime.types
mapssvgz
toimage/svg+xml
(and it does by default), it will be compressed the same way.svg
files (obviously mapped toimage/svg+xml
) are. Any sysadmin worth her salt will be applying at least gzip encoding transforms toimage/svg+xml
, as the savings are enormous (as we all know, text -- and especially verbose formats like XML -- compresses very nicely). But that means that any statically extant.svgz
files will be double-encoded, then decoded (read: decompressed) only once by the client, before attempting to render them (incorrectly) asimage/svg+xml
not in need of any additional transforms.I propose a separate
image/svgz+xml
or similar (image/svg+xml+gzip
?) that will allow transport applications/layers to distinguish betweensvg
andsvgz
files via their MIME type alone, so that sysadmins do not need to choose between being able to servesvgz
files and serving uncompressed plainsvg
files.(There is plenty of precedent here; e.g.
.docx
and.xlsx
files are actually zip files but have their own MIME type to prevent exactly this sort of confusion.)More practically speaking, a browser served a
svgz
file as generated by the myriad of SVG editors/compressors/optimizers/etc without a gzipContent-Encoding
will not be able to render the image due to broken encoding (tested in Firefox, Internet Explorer, and Chrome); i.e. there is no client-side heuristic already in place to address this.