ipfs / notes

IPFS Collaborative Notebook for Research
MIT License
402 stars 31 forks source link

Recognize IPFS gateways on arbitrary websites #92

Open Kubuxu opened 8 years ago

Kubuxu commented 8 years ago

Currently looking for dnslink is a way to check if site is hosted on IPFS via gateway.

I propose that we use other TXT entry specifying that this domain should be served under IPFS/NS.

It could be TXT IPFS-Gateway=true. Simpler nature of this entry means that it can have much longer TTL and can be cached for much longer time.

As workaround of lack of TXT record support in browsers we would need API endpoint for checking it.

Also it would be great if we could have agree on some sane default TTL in case of dnslink and this entry are missing thus increasing cache hits even more.

EDIT: Also as dnslink gets support for _dnslink. prefix something similar might be done in case of this also.

lidel commented 8 years ago

We already have a handy DNS query API (provided by go-ipfs) which returns dnslink value:

GET http://localhost:5001/api/v0/dns/ipfs.io

{"Path":"/ipfs/QmSrk8wCZ9oUgSCxJq4TXyJnPBYfoLDpKLGn7hLPNzauVX"}

and for site without dnslink in TXT record:

GET http://localhost:5001/api/v0/dns/google.com

{"Message":"could not resolve name.","Code":0}

Quick idea/brain dump:

Perhaps if TXT record contains a httpmirror keyword the response could additionally include something like {"Path": "(..)", "HTTPMirror": true}:

ipfs.io. 76 IN TXT "httpmirror;dnslink=/ipfs/QmSrk8wCZ9oUgSCxJq4TXyJnPBYfoLDpKLGn7hLPNzauVX"

It would be (kinda) reverse of dnslink: indicate that queried host provides real IPFS resources under /ipfs/ and /ipns/ which can be safely loaded from IPFS instead of HTTP.

Kubuxu commented 8 years ago

Point of my request is to provide some tag that could have very long TTL and could be cached. TTL of dnslink is limited as you need update it if you want to change the site (is use 60s TTL on most my dnslinks). Latency of DNS resolution can be anything from 100ms to +1s.

As decision if site is to be shown normally or redirected to IPFS should be taken quickly it would be best if we could achieve high cache hit rate for this operation.

Also it would allow sites that host gateways for themselves (or use ipfs.io under the hood) to say that it is safe to redirect it to other gateway.

lidel commented 8 years ago

Ok, how about a separate TXT record with long TTL?

ipfs.pics. 246696 IN TXT "ipfsgateway=public"

Related idea: this new TXT record could additionally provide a hint if this is a public IPFS gateway or if it is restricted to a limited subset of whitelisted resources.

Use case for restricted gateway: a person/organization do not want to be worried about someone else uploading undesired content to ipfs and linking to it using company.com hostname (I assume this kind of gateway would return HTTP 404 for all non-whitelisted IPFS resources).

DNS hint would simply indicate that although some resources can be loaded from IPFS, it should not be used as a public gateway.

Kubuxu commented 8 years ago

I like the idea of restricted. The hint would be placed along side dnslink record so in the root of selected record or _dnslink. sub-record.

Also do we want, and how, to cover sites like http://ipfs.pics that are gateways but the root site has to be accessed directly? (It might also happen that they won't work if we redirect gateway traffic to local gateway as for CORS and other magic).

EDIT: If site defines dnslink and ipfsgateway=public, we should be safe to redirect it fully to local gateway. There are still edge cases (local storage, cookies for API calls to original site), but I think that defining both of those variables should make site safe to move into local gateway.

EDIT2: Sites that want to stay under their own domain should just not define ipfsgateway but there might be better solution.

EDIT3: There are really three cases:

  1. Site that wants to fully move to local gateway.
  2. Site that wants its root to stay on remote host but /ipfs/ and /ipns can be moved to local gateway.
  3. Site that is gateway but wants to fully stay under its own domain. (XHR, cookies, localStorage and so on).
jbenet commented 8 years ago

I propose that we use other TXT entry specifying that this domain should be served under IPFS/NS.

i don't understand this-- what does this mean?

As decision if site is to be shown normally or redirected to IPFS should be taken quickly it would be best if we could achieve high cache hit rate for this operation.

Why isn't this figured out from the dnslink itself? If you have a dnslink with an /ip{fn}s link, you can use ipfs to resolve it just fine. Browsers today are unaware of ipfs so they need to use a gateway. but custom apps don't.


Related idea: this new TXT record could additionally provide a hint if this is a public IPFS gateway or if it is restricted to a limited subset of whitelisted resources.

Why should this be in DNS and not under a route in the gateway, such as /version ?


Site that wants to fully move to local gateway.

All /ipfs and /ipns links at a gatway should be able to move to local gateway, always. If we want some standard way to check for a gateway (eg /version)

Site that wants its root to stay on remote host but /ipfs/ and /ipns can be moved to local gateway.

Natural consequence of the above.

Site that is gateway but wants to fully stay under its own domain. (XHR, cookies, localStorage and so on).

No if the site has an ipfs gateway and uses ipfs links /ipfs/..., /ipns/... these can always be moved to a local gateway. XHR/cookies/localStorage are irrelevant here, they SHOULD be handled through the ipfs or ipns root hash as the "suborigin". (once suborigins land -- https://www.chromium.org/developers/design-documents/per-page-suborigins, already implemented)

Kubuxu commented 8 years ago

This DNS record would say if site wants to move under local/client gateway and how.

There are many reasons for including something like this, I will try sum it up here:

  1. Currently FF add-on faces problem of how and which sites redirect to local gateway and which not. This decision has to be taken before the HTTP connection start, which means asking IPFS if that site is hosted under IPFS. dnslinks with low TTLs are not suitable for such resolutions, they would cause delays which can be quite significant (depending on many factors). Also non-hit to dnslink is not cached. I've proposed that lack of flag could be also cached so work of the gateway redirector does not slow down the rest of browser. We have to choose IPFS or HTTP and we have to choose it fast because user waits.
  2. It would allow sites to work via domain name for add-on users even if backing gateway died.
  3. It can be figured out form dnslink but not completely. You can't know from dnslink if site wants to move fully under local gateway or not.
  4. Suborgins haven't landed yet (at least not on any FF, I am running alpha stream), and it might take long time to finalize this spec and even longer to be implemented in most browsers.
  5. XHR/Origin makes difference if site wants to call home or other non-IPFS service. We want to allow smooth transition and not say: "You have to forget about what you already done and do it again this way". (Case 2)
  6. ipfs.pics cannot be moved under local gateway because it calls home, nor can any other site that wants to use localStorage (as there are no Suborgins yet). (Case 2)
  7. If site stays under its domain name it can't execute XHR requests to local gateway, they are just blocked (we will be investigating that, lidel/ipfs-firefox-addon#45). (Case 3)
  8. Some day situation might be reverse and sites will ask not to be redirected to IPFS but to stay under HTTP.
jbenet commented 8 years ago
  1. Currently FF add-on faces problem of how and which sites redirect to local gateway and which not. This decision has to be taken before the HTTP connection start, which means asking IPFS if that site is hosted under IPFS. dnslinks with low TTLs are not suitable for such resolutions, they would cause delays which can be quite significant (depending on many factors). Also non-hit to dnslink is not cached. I've proposed that lack of flag could be also cached so work of the gateway redirector does not slow down the rest of browser. We have to choose IPFS or HTTP and we have to choose it fast because user waits.

Do you mean domains like foo.com that setup a dnslink dns TXT record and an A record to our gateway?

for normal sites, i'd just assume http until the A record points to known ipfs gateways. but i get that it may not work as well.

this could be something you opt to do for the extension only but is not meant to stick around or be broader. I'm very wary of explosion of asking users to setup a bunch of records.

  1. It would allow sites to work via domain name for add-on users even if backing gateway died.

Could the use of the backing gateway be detected without pre-checking dns yourslef? (i.e. use the result of the A or CNAME records the browser gets on its own)

  1. It can be figured out form dnslink but not completely. You can't know from dnslink if site wants to move fully under local gateway or not.

right

  1. Suborgins haven't landed yet (at least not on any FF, I am running alpha stream), and it might take long time to finalize this spec and even longer to be implemented in most browsers.

right (but keep the scope in mind when creating tools + standard practices)

  1. XHR/Origin makes difference if site wants to call home or other non-IPFS service. We want to allow smooth transition and not say: "You have to forget about what you already done and do it again this way". (Case 2)

which site wants to call to which home. If a site at a /ipfs/... object is trying to hit /... some other non-ipfs path, that's a website creator error.

  1. ipfs.pics cannot be moved under local gateway because it calls home, nor can any other site that wants to use localStorage (as there are no Suborgins yet). (Case 2)

I don't think it does. There's a misunderstanding here:

  1. If site stays under its domain name it can't execute XHR requests to local gateway, they are just blocked (we will be investigating that, lidel/ipfs-firefox-addon#45). (Case 3)

True. this is one reson in support of gateway having friendly CORS (ipfs.io has it).

  1. Some day situation might be reverse and sites will ask not to be redirected to IPFS but to stay under HTTP.

I'm not convinced yet. So far, I think this can be figured out from the otherwise available information. I do not want to make lots of tools have to follow this until i'm sure it's necessary.

Kubuxu commented 8 years ago

Do you mean domains like foo.com that setup a dnslink dns TXT record and an A record to our gateway?

I mean that site example.com has dnslink but its own A records of gateway. We can't run dnslink lookup on every site visited by user, every time he refreshes it but we want to redirect all off them to local gateway if it is possible.

True. this is one reson in support of gateway having friendly CORS (ipfs.io has it).

The XHR problem is with site staying under its own domain but under the ipfs.io A records when add-on redirects /ipfs/... to local gateway.

ipfs.pics/ipfs/... is resolved by ipfs gateway. can be local. ipfs.pics/ipns/... is resolved by ipfs gateway. can be local.

Problem is also that you can't redirect all /ipfs/... to local gateway. Example: https://github.com/ipfs/notes/. FF add-on has option to redirect all /ipfs/... calls to local gateway but it breaks other sites. That is why site could specify it want to be regular webapp but wants to have /ipfs/... and /ipns/... redirected to local gateway which currently isn't possible and makes site like ipfs.pics almost not distributed (unless we explicitly specify it in the add-on configuration which can't be done for all sites).

jbenet commented 8 years ago

@Kubuxu curious, how would an entry in /etc/hosts mapping ipfs.io to localhost help here? (port may still be a problem :/)

also, @lidel do extensions have access to the address bar in firefox?

Kubuxu commented 8 years ago

Port still would be a problem (you can't bind 80 from normal user) but redirecting ipfs.io isn't an issue (it is already done, currently work is in making the bar show: fs:/ipfs/...).

Problem is with discovering all of those sites that might be using ipfs.ip or other gateways under the hood (it was said in an other thread that detection of IPFS after first HTTP response is too late).

lidel commented 8 years ago

@jbenet:

do extensions have access to the address bar in firefox?

Generally Firefox is in the process of “refactoring” addon ecosystem. The only “future-proof” SDK does not provide high level API for accessing this part of GUI (we can only add a toolbar button etc).

It is possible to access all GUI elements via low level APIs called XUL/XPCOM but it will be deprecated in 12-18 months and I would avoid any new work that requires it.

In future there will be a new SDK called WebExtensions and I plan to migrate the addon to it as soon as it is mature enough (not supported by release FF yet). It is a mess at the moment :)


Anyway, back to the topic :arrow_double_up:

To put the problem and idea in simpler words:

ghost commented 8 years ago

The question: how to safely detect an asset that can be loaded from IPFS?

  • We could use regex and redirect everything that starts with /ipfs/Qm. There is the problem of false-positives: we could make the regex quite specific, but what if the hash algorithm changes in future? I imagine Qm would change to something else, yes?
  • We could have additional DNS hint which indicates that everything under /ipfs/ can be safely loaded from IPFS. (this proposal)
  • Is there a third way?

I agree with jbenet that we can infer this info from what's already there. Not from the dnslink record (which is a totally different use case, imo), but from the URL of these assets. You mention github.com/ipfs/foobar and /ipfs/Qm as an example of how this is problematic, but maybe we can make /ipfs and /ipns path validation part of js-multiaddr, and use that in the addon? For /ipfs it must be a multihash, for /ipns it must be a multihash or domain name (see jbenet/go-is-domain). Whether the /ipfs or /ipns path can be resolved is a whole other issue of course, and you can only ever reliably say if it does resolve.

ghost commented 8 years ago

maybe we can make /ipfs and /ipns path validation part of js-multiaddr, and use that in the addon?

cc @diasdavid

ghost commented 8 years ago

Multiaddr validation can apply to other schemes just as well, e.g. /ip4/1000.0.0.0 is clearly invalid.

ghost commented 8 years ago

Just had another idea, could the addon expose a simple API to pages, so that a page can just tell the addon to regard it as an ipfs gateway?

Mithgol commented 8 years ago

@lgierth

We could use regex and redirect everything that starts with /ipfs/Qm, but what if the hash algorithm changes in future? I imagine Qm would change to something else, yes?

Using JavaScript .startsWith('/ipfs/Qm') for the path, and especially using even more specifix regex (such as /^\/ipfs\/Qm[1-9A-HJ-NP-Za-km-z]{44}/ for example), we should quite reliably infer that the site is indeed using the current version of the IPFS multihash. (And yes: that regex would have to be changed to something /(?:old|new)/ when the current version of the IPFS multihash is replaced.) Any use of such multihash in an address does not happen randomly, and thus the addressed resource is (quite likely, most likely) an IPFS resource.

Now to the next question: is it always safe to load a remote IPFS resource from a local IPFS gate? In an ideal world it should always be safe (that's what a P2P-distributed filesystem should be, that's what a P2P-distributed filesystem should do). However, in the real world some sites are facing XHR failures (because of CORS) such as lidel/ipfs-firefox-addon#45 and ipfs/go-ipfs#934 and ipfs/go-ipfs#1215.

Of course these sites' owners would wish for their sites to be not recognized as gateways (even when the URL parts clearly indicate the use of IPFS) to prevent such failures, and thus they would advocate their desire for some additional (and opt-in) feature (such as DNS TXT or other API).

I am quite tempted to say that these CORS-related problems deserve to be fixed eventually (not only worked around) and thus the proposed workaround should be designed as a temporary opt-out (not a permanent opt-in) so that most IPFS sites do not require additional configuration initially (until they eventually try XHR and cease to work and only then they need DNS TXT or other API to prevent rerouting to local IPFS gates). Here I write “most sites” because I honestly expect most IPFS sites to be more static (i.e. not trying to use XHR) and thus never encountering any need to opt out.

I have to add that most cases of such opt-out (currently and in the future) would be caused by very specific bugs (or lack of features) in IPFS implementations. Therefore DNS TXT (or other API) should be designed to convey some more specific message: not only “this site opts out of local IPFS rerouting”, but also “this site opts out of local IPFS rerouting unless the local IPFS gate is go-ipfs version 0.6 or newer” for the cases where the specific bug (or a lack of a specific feature) is known to be fixed already but when the fix is so recent that the users of many local gates utilizing previous IPFS versions are expected to fail inevitably.

XHR failure is such a bug, gateway CORS is such a feature. Here's another good example: I expect IPFS POST to become such a feature as well. The sites that start using IPFS POST (when it's implemented) would want an interface to opt out of the previous IPFS implementations.

If that interface is designed differently, it may harm the network:

lidel commented 8 years ago

Just had another idea, could the addon expose a simple API to pages, so that a page can just tell the addon to regard it as an ipfs gateway?

We could, but it is too late: javascript-based API could be accessed only after first page load. This means that if the page makes use of any assets from /ipfs/(..) those would be loaded via HTTP.

That is why we suggested DNS record: it can be read before any HTTP request is sent.

maybe we can make /ipfs and /ipns path validation part of js-multiaddr, and use that in the addon?

I really like this idea! :+1: There would be no regex maintenance cost and no duplicated effort across community. (greenkeeper.io would create a PR each time js-multiaddr is released.)

the proposed workaround should be designed as a temporary opt-out (not a permanent opt-in) so that most IPFS sites do not require additional configuration initially (..) [opt-out] should be designed to convey some more specific message: not only “this site opts out of local IPFS rerouting”, but also “this site opts out of local IPFS rerouting unless the local IPFS gate is go-ipfs version 0.6 or newer”

This is a very good point. If browser addons have a solid way of detecting IPFS paths (eg. via js-multiaddr), loading those assets from IPFS by default is what we want.

Opt-out would effectively be a blacklisting of specific daemon/protocol versions (eg <=0.4.0 or all: *). We probably would need to extend /api/v0/version – it currently returns only a daemon version:

{"Version":"0.4.0-dev","Commit":"","Repo":"3"}
fazo96 commented 8 years ago

This would replace my currently very crappy solution to detect if the app is running on a gateway.

Take it as an example of what not to do :smile:

lidel commented 7 years ago

Cross-posting from https://github.com/ipfs/in-web-browsers/issues/9#issuecomment-287404619:

Update: there is a proposal for adding DNS-related API to WebExtension API. I've added suggestion to include querying for TXT records (so that dnslink can be detected natively by the browser):

Mithgol commented 7 years ago

Alas, that proposal has been WONTFIXed.

lidel commented 6 years ago

An update from browser extension front (https://github.com/ipfs/ipfs-companion/issues/16#issuecomment-336641844):

We are finally able to detect IPFS paths without obvious false-positives! 🎉

Latest version of our browser extension (v2.0.13) performs path validation on every request:

  • For ^/ipfs/ paths, CID must be valid (CIDv0 or CIDv1)
  • For ^/ipns/ peerid needs to be a valid CID or be a FQDN with dnslink in TXT record

If a path starts with a valid IPFS-enabled root, then request is redirected to a local gateway. This is enabled by default and works on every website, without need for header-signalling or wasted round-trips.

There is also somehow related discussion about keeping /ipns/{fqdn} for sites with dnslink: https://github.com/ipfs/ipfs-companion/issues/298. It aims to improve some aspects such as user experience, future-proof bookmarks and dnslink TTL & value cache expiration in browser extension.

lidel commented 6 years ago

Small updates from browser extension (landed in v2.4.3):