ipfs / ipfs-companion

Browser extension that simplifies access to IPFS resources on the web
https://docs.ipfs.tech/install/ipfs-companion/
Creative Commons Zero v1.0 Universal
2.04k stars 323 forks source link

[MV3 Beta Bugs] Single catch-all rule per subdomain gateway #1278

Open lidel opened 9 months ago

lidel commented 9 months ago

This is not blocking https://github.com/ipfs/ipfs-companion/pull/1182, ok to improve in follow-up PR.

Subdomain gateway URLs seem to constantly override a rule,with new one that is site-specific, instead of adding a single catch-all one. This makes the UI for managing rules confusing, as the catch-all is pointing at website-specific URL, but theings seem to work fine.

How to reproduce

  1. Open https://cid-ipfs-tech.ipns.dweb.link/
  2. Confirm below rule got created (note the Origin is catch-all, but Target is specific to cid.ipfs.tech)

    2023-09-14_21-59

  3. Open https://docs-ipfs-tech.ipns.dweb.link
  4. Confirm the rule created by (1) is no longer present, and a new one got created (note the Origin is catch-all, but Target is specific to docs.ipfs.tech now)

    2023-09-14_22-00

Expected behavior

Create a single catch-all rule per subdomain gateway.

For example:

whizzzkid commented 9 months ago

@lidel it looks like this is a invalid edge case

Source: https://cid-ipfs-tech.ipns.dweb.link/ Redirect: https://localhost:8080/ipns/cid.ipfs.tech/

the rule add just updates the redirect if the source is the same, because we don't want to break the current redirect. However, that example is weird because - get converted to . which is handled, but instead if you use a better url: https://cid.ipfs.tech.ipns.dweb.link/ it works as intended. Are hyphen allowed to be converted to periods?

Screenshot 2023-09-17 at 2 48 29 AM
lidel commented 9 months ago

@whizzzkid Could be that i've pasted cid.ipfs.tech.ipns.dweb.link instead of cid-ipfs-tech.ipns.dweb.link, but we need to do the right thing in both cases because the former is valid for http:// URL, and the latter is required for https://.

Subdomain gateways on https:// have to use inlined names which have all . replaced with - (see the algorithm from https://specs.ipfs.tech/http-gateways/subdomain-gateway/#host-request-header) because of TLS wildcard certificates like *.ipns.dweb.link.

TLS wildcard certs work only for a single DNS label, so we can't have . in the domain name on the left side of .ipns..

So if you have captured a value under subdomain like (value).ipns.dweb.link, and the value has no . and has at least one - and is not a valid CID, you need to convert it back from the inlined version to a DNS name.

  1. replace every single - with .
  2. replace every double -- with -

Good one for testing is https://en-wikipedia--on--ipfs-org.ipns.dweb.link/ as it had both.

whizzzkid commented 9 months ago

@lidel in that case, we cannot do anything about this. The declarative rules are designed to be static in nature. The most we can do is regex manipulation. So capturing group and replacing it in the redirect is the best we can do.

Which means in the example:

Source: https://en-wikipedia--on--ipfs-org.ipns.dweb.link/ Current: http://localhost:8080/ipns/en.wikipedia-on-ipfs.org which translates to http://en.wikipedia-on-ipfs.org.ipns.localhost:8080/ New: http://localhost:8080/ipns/en-wikipedia--on--ipfs-org which will translate to http://en-wikipedia--on--ipfs-org.ipns.localhost:8080/

which will work, but is that an acceptable tradeoff? The regex would be simple i.e. find whatever the path is and place the group in there, remember we cannot replace -->. and ---> - on the fly.

lidel commented 9 months ago

Hm... I get it now. Thank you for explanation @whizzzkid!

I think the tradeoff would be net positive. The inlined DNSLink identifier on subdomain is a rather a niche use case, most DNSLink+Companion users will visit DNSLink website using the original URL.

We would have a single static rule per subdomain gateway, no need for constant rule-miss and retractive updates:

As for cosmetics, replacing -. during the redirect from path to subdomain when protocol scheme has no TLS is something boxo/gateway (Kubo) can implement and ship in near future.

Kubo already does things like normalization of CIDv0 to CIDv1 or switching to Base36 (example) when Base32 would be too long for IPNS Name to fit in a single DNS Label (https://github.com/ipfs/kubo/issues/7318). I think it is sensible if we decode inlined DNS before making the redirect too.

This only impacts internals of subdomain redirect from public to local gateway. Functionally, there is no bug. I don't want to block MV3 release on this.

@whizzzkid so feel free to park this fix until we have Kubo with necessary normalization logic (i've opened https://github.com/ipfs/boxo/pull/462 but tbd if this will get into Kubo 0.23 or 0.24).