twingly / twingly-url

:link: Twingly URL tools
https://rubygems.org/gems/twingly-url
MIT License
12 stars 1 forks source link

Same URL in Unicode isn't equal to the same ASCII URL #94

Open jage opened 8 years ago

jage commented 8 years ago

This is the case since we compare with to_s (code).

[15] pry(main)> u1 = Twingly::URL.parse("https://www.foo.ایران.ir/bar")
=> #<Twingly::URL:0x3fc621e4ca10 https://www.foo.ایران.ir/bar>
[16] pry(main)> u2 = Twingly::URL.parse("https://www.foo.xn--mgba3a4f16a.ir/bar")
=> #<Twingly::URL:0x3fc621e00f34 https://www.foo.xn--mgba3a4f16a.ir/bar>
[17] pry(main)>
[18] pry(main)> u1 <=> u2
=> 1
[19] pry(main)> u1.to_s
=> "https://www.foo.ایران.ir/bar"
[20] pry(main)> u2.to_s
=> "https://www.foo.xn--mgba3a4f16a.ir/bar"
[21] pry(main)>

We might also consider using the normalized version when comparing (might not be what we want though).

dentarg commented 8 years ago

Yes, we need to normalize internally.

dentarg commented 8 years ago
$ bundle console
[1] pry(main)> u1 = Twingly::URL.parse("https://www.foo.ایران.ir/bar")
=> #<Twingly::URL:0x3fd236143104 https://www.foo.ایران.ir/bar>
[2] pry(main)> u2 = Twingly::URL.parse("https://www.foo.xn--mgba3a4f16a.ir/bar")
=> #<Twingly::URL:0x3fd2351830e8 https://www.foo.xn--mgba3a4f16a.ir/bar>
[3] pry(main)> u1 == u2
=> false
jage commented 8 years ago

@dentarg we use Comparable, so this affects a few generated methods, and all will be fixed when <=> is fixed.

The Comparable mixin is used by classes whose objects may be ordered. The class must define the <=> operator, which compares the receiver against another object, returning -1, 0, or +1 depending on whether the receiver is less than, equal to, or greater than the other object. If the other object is not comparable then the <=> operator should return nil. Comparable uses <=> to implement the conventional comparison operators (<, <=, ==, >=, and >) and the method between?.

http://ruby-doc.org/core-2.3.0/Comparable.html

dentarg commented 8 years ago

We should also look over Hash Equality

$ irb
irb(main):001:0> require 'twingly/url'
=> true
irb(main):002:0> Twingly::URL::VERSION
=> "5.0.1"
irb(main):003:0> url = 'http://google.com'
=> "http://google.com"
irb(main):004:0> url1 = url2 = Twingly::URL.parse(url)
=> #<Twingly::URL:0x3fe0c1959fac http://google.com>
irb(main):005:0> { url1 => 'url1' }.has_key?(url2)
=> true
dentarg commented 8 years ago

Oops, should not have done url1 = url2 = Twingly::URL.parse(url). Here's what I mean

irb(main):009:0> url2 = Twingly::URL.parse(url)
=> #<Twingly::URL:0x3fe0c1910898 http://google.com>
irb(main):010:0> { url1 => 'url1' }.has_key?(url2)
=> false
irb(main):011:0> { url1 => 'url1' }.keys.include?(url2)
=> true
dentarg commented 5 years ago

Dumping related/interesting links: https://bugs.ruby-lang.org/issues/12852, https://url.spec.whatwg.org/

jage commented 5 years ago

We should also look over Hash Equality

I think this was implemented in #129

$ bundle exec pry
[1] pry(main)> require_relative "lib/twingly/url"
=> true
[2] pry(main)> url = 'http://google.com'
=> "http://google.com"
[3] pry(main)> url1 = Twingly::URL.parse(url)
=> #<Twingly::URL:0x3fc3b6db63ac http://google.com>
[4] pry(main)> url2 = Twingly::URL.parse(url)
=> #<Twingly::URL:0x3fc3b6d939b0 http://google.com>
[5] pry(main)> { url1 => 'url1' }.has_key?(url2)
=> true
[6] pry(main)> { url1 => 'url1' }.keys.include?(url2)
=> true