servo / rust-url

URL parser for Rust
https://docs.rs/url/
Apache License 2.0
1.27k stars 318 forks source link

Add feature "disable_idna" #836

Open parasyte opened 1 year ago

parasyte commented 1 year ago

This has a fairly colorful history.

The idna dependency was first made optional in #728, and then reverted in #790. This PR introduces a feature flag to opt-in to ASCII-only domain support, as suggested in https://github.com/servo/rust-url/pull/790#issuecomment-1239186515. It partially reverts #790.

Unfortunately, this opt-in method is less robust than using a default-enabled feature and making idna optional (#728) because idna is always built. It is trivial for future changes to accidentally link idna unconditionally, making the disable_idna feature a no-op.

In my testing, using this feature flag saves around 235 KiB in the produced binary in release mode on Windows x86_64 builds. (Side note: Windows builds with VS are "always stripped", and debug symbols are provided in a separate pdb file. For everything else, the binary Cargo.toml can include strip = "symbols" in the release profile.) The feature was originally requested for WASM, and binary size deltas should be in the same ballpark with that arch.

valenting commented 1 year ago

So, I'm less and less convinced that this is a direction we want to pursue. If a crate using URL wants to opt out of the IDNA, they can replace the idna dependency in Cargo.toml with a dummy crate that implements domain_to_unicode, domain_to_ascii, domain_to_ascii_strict, and contains struct Errors- see instructions here

bors-servo commented 1 year ago

:umbrella: The latest upstream changes (presumably b33514a3bc26f25945c444877338060566fbf4de) made this pull request unmergeable. Please resolve the merge conflicts.

valenting commented 8 months ago

Rather than adding all this to the url crate, instead I suggest patching out idna as suggested earlier. I have an example here Pretty much the only requirement is adding the following to your Cargo.toml

[patch.crates-io]
idna = {git = 'https://github.com/valenting/no-idna.git'}

This avoids adding any spec incompatible flag to this crate. If this works for you I propose we close this issue.

parasyte commented 8 months ago

I don't think patch works with published crates. If I can't get rid of the heavy IDNA dependency in my published project, that doesn't seems like a useful workaround.

micolous commented 7 months ago

I've written an alternative to this for wasm32-unknown-unknown in #887.

For Windows, there are platform IDN bindings which idna could use instead, which is significantly smaller than the UTS46 mapping table and unicode_normalization, even for projects which don't already link against windows-rs!

What would need to happen for that is for Config and Errors to move up into the parent crate, and then implement a work-alike for uts46::Idna which maps onto windows::Win32::Globalization::{IdnToAscii, IdnToUnicode}. A quick reading of MSDN suggests there may be some functionality that doesn't work quite the same.

Edit, after further experimenting: The big gotcha with the Windows APIs is that you can't control whether IDNA2003 or IDNA2008 is used – it's tied to the OS version:

The IdnToNameprepUnicode, IdnToAscii, and IdnToUnicode algorithms are not applicable to Windows NT, Windows 2000, Windows XP, or Windows Server 2003. These algorithms follow the IDNA2003 standards for Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2 operating system. Otherwise, the algorithms follow the IDNA2008+UTS46 standards.

Even on Windows 11 23H2 (which should apply IDNA2008+UTS46 rules), to_ascii("☕.com") converts successfully to xn--53h.com.

There are a lot of other failures, and canonical examples in the IDNA spec (used in idna crate tests) aren't working.

I have a branch of rust-url which uses the Windows APIs, but I'm not planning to work on that any further due to the compatibility issues and everything being pretty broken.