ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.19k stars 3.02k forks source link

Need more support mime-types. #2164

Open ghost opened 8 years ago

ghost commented 8 years ago

IPFS might well be appropriate for publication in its whole of HTML-pages. But there is a problem. MHTML is not possible to place because IPFS is not able to give this format with mime-type: message/rfc822.

For example: http://gateway.ipfs.io/ipfs/QmfHtsEyXGdJm6Yo4frKLdKyDKT5G6ubECfVLnvkVUkscM

Is there any solution to this problem?

Variant with download and then view is not necessary, as for easy operation requires opening documents directly in the browser.

whyrusleeping commented 8 years ago

we implemented some code to make this work, i'll try and hack at something and report back. This may be a fairly simple fix

whyrusleeping commented 8 years ago

So, the mime type detection used by go-ipfs gateway is from the go standard library here: https://golang.org/src/net/http/sniff.go

rht commented 8 years ago

http.DetectContentType is already used when http.ServeContent is invoked.

2230 does this explicitly, however, and includes charset detection through https://golang.org/x/net/html/charset.

jbenet commented 8 years ago

If we save the MIME type in the unixfs obect (file or metadata), we could set it explicitly instead of just relying on the mime sniffer.

rht commented 8 years ago

The mime detect/sniffing is still required, but moved to the earlier part of the chain. By explicit, where should it be put in the object? Does this require ipld?

The file API does have a type attribute. Is this equivalent?

jbenet commented 8 years ago

unixfs in IPLD may change. see https://github.com/ipfs/ipld-examples/blob/master/unixfs the MIME type can go in the file there.

in the current formats the MIME type maybe should go in a Metadata object, but that may be more annoying right now than useful.

On Thu, Jan 21, 2016 at 11:24 PM rht notifications@github.com wrote:

The mime detect/sniffing is still required, but moved earlier to the chain. By explicit, where should it be put in the object? Does this require ipld?

The file API https://www.w3.org/TR/FileAPI/ does have a type attribute. Is this equivalent?

— Reply to this email directly or view it on GitHub https://github.com/ipfs/go-ipfs/issues/2164#issuecomment-173832009.

jefft0 commented 8 years ago

Hello. This is my first posting on the project, but I need to jump in. IPFS is attractive because the multihash of an important file (like a measurements file supporting a scientific paper) only depends on the data, not on arbitrary decisions by the publisher like filename or original location. Two different publishers holding the same data file should compute the same multihash. If you insert the mime type in the object itself, this is also an arbitrary decision by the publisher. For example, is the measurements file text/plain, text/csv or application/ms-excel? I urge to keep metadata like mime type external to the data itself so that a multihash link is just about the data.

singpolyma commented 7 years ago

I agree that having the hash of the same data (regardless of mime type or file name) be the same is important, but also having an associated MIME type available using a different hash is important so that /ipfs/* links can be directly used by MIME-aware applications (including browsers)

davux commented 6 years ago

@jefft0,

Two different publishers holding the same data file should compute the same multihash.

I'd like to (kindly) challenge that point. The scenario you're talking about here is:

Can you illustrate that situation with a real-life example?

Generally, I don't think it's a big deal that a file has two different representations. It might slightly affect the overall efficiency of the network, but then we should also be worried about empty newlines at the end of text files, etc.

At any rate, if content integrity does matter but metadata doesn't, then the users can always throw away the metadata and compute a hash of the content itself, it doesn't have to be done through multihashes. (Following the newline analogy, if two users want to make sure two text files are identical and trailing newlines don't matter, they can trim() and then compute a hash of the result.)

justinmchase commented 6 years ago

I agree with @davux , if the two publishers both have the same file they would be likely to use the same MIME type when publishing. Especially if the production of MIME type was automatic.

If they didn't have the same MIME type for some reason, then having two different hashes would seem like a reasonable thing to do. No?

The other side of the problem is that people who are trying to receive the published document use different MIME type sniffing logic and one gets it wrong. Who's more likely to get it wrong, or who should have the onus of getting the MIME type right, the publisher or the reader?

singpolyma commented 6 years ago

It seems we're headed towards raw leaves as the default, which seems good. So there are two places the mime could go: in the directory (where filename already is) or in an intermediate IPLD object that has metadata like mime and a pointer to the leaf

justinmchase commented 6 years ago

That sounds good, if there is a compromise where the data is in somewhere rather than inferred but the data in the leaves can also be raw, seems like everyone is happy then.

ivan386 commented 5 years ago

I try to use xhtml as index.html. But gateway return wrong mime-type for that. UnixFS have MetaData field and it have MimeType filed.

I handmade test identity link with metadata but get content-type: text/plain; charset=utf-8 instead of content-type: application/xhtml+xml that in metadata block.

I try to use raw block as link in Metadata block but get error: expected protobuf dag node

lidel commented 5 years ago

I don't think Metadata from unixfsv1 is used in this case. AFAIK Gateway exposed by go-ipfs does mime-sniffing via net/http/sniff.go

Hardcoding explicit content-type will be possible with unixfsv2 (more details in https://github.com/ipfs/unixfs-v2/issues/11).

ivan386 commented 5 years ago

@lidel Gateway do not get MimeType field from metadata block. Gateway must use it if it set.

lidel commented 5 years ago

Correct, we need (and is missing right now):

  1. ability to set content type in metadata (eg. during ipfs add)
  2. make Gateway code use it, if present
lidel commented 5 years ago

Summarized potential solutions in https://github.com/ipfs/in-web-browsers/issues/152 One is to embedd content-type in DAG metadata, another is specific to HTTP Gateway and proposes content-type override via drop-in config files similar to .gitattributes.

kevincox commented 4 years ago

One cheep solution is allowing extensions to be used in the gateway. This way we don't have to rely on file sniffing, but only an extension -> mime mapping (like apache and nginx have been doing for years).

Example would be https://ipfs.io/ipfs/bafkreiajjehupljsltknxzdcenhrmcarjuagybwx7ht2bcyyon6s3ayn2m.css is served as text/css. This isn't as nice as the mime type traveling with the data but is very quick to implement and doesn't change the network, just the gateway.

Stebalien commented 4 years ago

You can currently use https://ipfs.io/ipfs/bafkreiajjehupljsltknxzdcenhrmcarjuagybwx7ht2bcyyon6s3ayn2m?filename=stylesheet.css

kevincox commented 4 years ago

For CSS that works fairly well but for things like SVG it annoyingly sets the Content-Disposition header.

Stebalien commented 4 years ago

Hm. Yeah, it's designed for specifying the download name. Is it not possible to wrap your file in a directory?

ivan386 commented 4 years ago

@lidel Need first make Gateway code use MimeType from metadata block, if present. And then make ability to set content type in metadata.

kevincox commented 4 years ago

It's possible to wrap in in a directory but I would prefer not to do it because:

  1. I now need to give it a name which feels wrong in a content-addressed storage system.
  2. Now the client needs to fetch the directory before the file which is added latency.
Stebalien commented 4 years ago

For now, you can use inline CIDs for that (not ideal, but a workaround):

https://ipfs.io/ipfs/bafyaanysgefciakvciqassipi6wtexgu3psgei2pcyebctianqdnp6phucfrq435fwbq3uysavtc4y3tommnnzacbibaqai/f.css

> mkdir toadd
> mv myStylesheet.css toadd/f.css
> ipfs add -q -r --raw-leaves --cid-version=1 --inline --inline-limit=64 toadd
bafyaanysgefciakvciqassipi6wtexgu3psgei2pcyebctianqdnp6phucfrq435fwbq3uysavtc4y3tommnnzacbibaqai
ivan386 commented 4 years ago

This will be shorter:

ipfs add --cid-base base64url -Q -w --raw-leaves --cid-version 1 --inline --inline-limit 64 --stdin-name .css < myStylesheet.css

https://ipfs.io/ipfs/uAXAANhIwCiQBVRIgCUkPR60yXNTb5GIjTxYIEU0AbAbX-eegixhzfS2DDdMSBC5jc3MY1uQCCgIIAQ/.css