Open ghost opened 8 years ago
we implemented some code to make this work, i'll try and hack at something and report back. This may be a fairly simple fix
So, the mime type detection used by go-ipfs gateway is from the go standard library here: https://golang.org/src/net/http/sniff.go
http.DetectContentType
is already used when http.ServeContent
is invoked.
If we save the MIME type in the unixfs obect (file or metadata), we could set it explicitly instead of just relying on the mime sniffer.
The mime detect/sniffing is still required, but moved to the earlier part of the chain. By explicit, where should it be put in the object? Does this require ipld?
The file API does have a type
attribute. Is this equivalent?
unixfs in IPLD may change. see https://github.com/ipfs/ipld-examples/blob/master/unixfs the MIME type can go in the file there.
in the current formats the MIME type maybe should go in a Metadata object, but that may be more annoying right now than useful.
On Thu, Jan 21, 2016 at 11:24 PM rht notifications@github.com wrote:
The mime detect/sniffing is still required, but moved earlier to the chain. By explicit, where should it be put in the object? Does this require ipld?
The file API https://www.w3.org/TR/FileAPI/ does have a type attribute. Is this equivalent?
— Reply to this email directly or view it on GitHub https://github.com/ipfs/go-ipfs/issues/2164#issuecomment-173832009.
Hello. This is my first posting on the project, but I need to jump in. IPFS is attractive because the multihash of an important file (like a measurements file supporting a scientific paper) only depends on the data, not on arbitrary decisions by the publisher like filename or original location. Two different publishers holding the same data file should compute the same multihash. If you insert the mime type in the object itself, this is also an arbitrary decision by the publisher. For example, is the measurements file text/plain, text/csv or application/ms-excel? I urge to keep metadata like mime type external to the data itself so that a multihash link is just about the data.
I agree that having the hash of the same data (regardless of mime type or file name) be the same is important, but also having an associated MIME type available using a different hash is important so that /ipfs/*
links can be directly used by MIME-aware applications (including browsers)
@jefft0,
Two different publishers holding the same data file should compute the same multihash.
I'd like to (kindly) challenge that point. The scenario you're talking about here is:
Can you illustrate that situation with a real-life example?
Generally, I don't think it's a big deal that a file has two different representations. It might slightly affect the overall efficiency of the network, but then we should also be worried about empty newlines at the end of text files, etc.
At any rate, if content integrity does matter but metadata doesn't, then the users can always throw away the metadata and compute a hash of the content itself, it doesn't have to be done through multihashes. (Following the newline analogy, if two users want to make sure two text files are identical and trailing newlines don't matter, they can trim()
and then compute a hash of the result.)
I agree with @davux , if the two publishers both have the same file they would be likely to use the same MIME type when publishing. Especially if the production of MIME type was automatic.
If they didn't have the same MIME type for some reason, then having two different hashes would seem like a reasonable thing to do. No?
The other side of the problem is that people who are trying to receive the published document use different MIME type sniffing logic and one gets it wrong. Who's more likely to get it wrong, or who should have the onus of getting the MIME type right, the publisher or the reader?
It seems we're headed towards raw leaves as the default, which seems good. So there are two places the mime could go: in the directory (where filename already is) or in an intermediate IPLD object that has metadata like mime and a pointer to the leaf
That sounds good, if there is a compromise where the data is in somewhere rather than inferred but the data in the leaves can also be raw, seems like everyone is happy then.
I try to use xhtml as index.html. But gateway return wrong mime-type for that. UnixFS have MetaData field and it have MimeType filed.
I handmade test identity link with metadata but get content-type: text/plain; charset=utf-8
instead of content-type: application/xhtml+xml
that in metadata block.
I try to use raw block as link in Metadata block but get error: expected protobuf dag node
I don't think Metadata from unixfsv1 is used in this case. AFAIK Gateway exposed by go-ipfs does mime-sniffing via net/http/sniff.go
Hardcoding explicit content-type will be possible with unixfsv2 (more details in https://github.com/ipfs/unixfs-v2/issues/11).
@lidel Gateway do not get MimeType field from metadata block. Gateway must use it if it set.
Correct, we need (and is missing right now):
ipfs add
)Summarized potential solutions in https://github.com/ipfs/in-web-browsers/issues/152
One is to embedd content-type in DAG metadata, another is specific to HTTP Gateway and proposes content-type override via drop-in config files similar to .gitattributes
.
One cheep solution is allowing extensions to be used in the gateway. This way we don't have to rely on file sniffing, but only an extension -> mime mapping (like apache and nginx have been doing for years).
Example would be https://ipfs.io/ipfs/bafkreiajjehupljsltknxzdcenhrmcarjuagybwx7ht2bcyyon6s3ayn2m.css is served as text/css
. This isn't as nice as the mime type traveling with the data but is very quick to implement and doesn't change the network, just the gateway.
For CSS that works fairly well but for things like SVG it annoyingly sets the Content-Disposition header.
Hm. Yeah, it's designed for specifying the download name. Is it not possible to wrap your file in a directory?
@lidel Need first make Gateway code use MimeType from metadata block, if present. And then make ability to set content type in metadata.
It's possible to wrap in in a directory but I would prefer not to do it because:
For now, you can use inline CIDs for that (not ideal, but a workaround):
> mkdir toadd
> mv myStylesheet.css toadd/f.css
> ipfs add -q -r --raw-leaves --cid-version=1 --inline --inline-limit=64 toadd
bafyaanysgefciakvciqassipi6wtexgu3psgei2pcyebctianqdnp6phucfrq435fwbq3uysavtc4y3tommnnzacbibaqai
This will be shorter:
ipfs add --cid-base base64url -Q -w --raw-leaves --cid-version 1 --inline --inline-limit 64 --stdin-name .css < myStylesheet.css
IPFS might well be appropriate for publication in its whole of HTML-pages. But there is a problem. MHTML is not possible to place because IPFS is not able to give this format with mime-type: message/rfc822.
For example: http://gateway.ipfs.io/ipfs/QmfHtsEyXGdJm6Yo4frKLdKyDKT5G6ubECfVLnvkVUkscM
Is there any solution to this problem?
Variant with download and then view is not necessary, as for easy operation requires opening documents directly in the browser.