ipfs / specs

Technical specifications for the IPFS protocol stack
https://specs.ipfs.tech
1.15k stars 232 forks source link

gateway: run Unicode Normalisation Forms on path gateway inputs #457

Open Jorropo opened 5 months ago

Jorropo commented 5 months ago

See context here: https://github.com/ipfs/kubo/issues/10286#issuecomment-1886822017 Relevant Unicode spec: https://unicode.org/reports/tr15/

hacdias commented 5 months ago

For reference: https://go.dev/blog/normalization

lidel commented 5 months ago

Thank you for raising this. We operate under ecosystem constraints:

What is the problem we are trying to solve? My understanding of linked issue is user copying "non-normalised" content path from somewhere, and getting "not found" error because DAG uses noralised filenames (notation mismatch).

If so, I think the best we could do UX-wise, is to retry on "not found" and trying normalised (NFC) / decomposed (NFD) forms (to cover both variants).

This way we don't break datasets where file already exists, but still fix HTTP 404 for cases where only file in different notation exists.

If this is something we want to do, should be included in https://github.com/ipfs/specs/pull/453 to ensure consistency across web contexts (which we will then reference from https://specs.ipfs.tech/http-gateways/path-gateway/).

But this introduces a magical behavior which hides the underlying problem macOS introduced – see my comment in https://github.com/ipfs/kubo/issues/10286#issuecomment-1930195484.

Perhaps it is better to NOT fix reads, and instead give users ability to force specific normalization during data onboarding instead? (like ipfs add --normalize-names none|nfd|nfc suggested in https://github.com/ipfs/kubo/issues/10286#issuecomment-1930195484).