Closed calvinsugianto closed 1 year ago
No, it is not possible.
https://sothebysrealty.ca/insightblog/wp-content/uploads/2020/03/Featured-62-Ch-du-Bois%C3%A9-Lac-Beauport-QC-sothebys-international-realty-canada-1440x570.jpg
is correct
As an example, that is what I get back from Google Chrome if I enter the url with é
in the path: https://sothebysrealty.ca/insightblog/wp-content/uploads/2020/03/Featured-62-Ch-du-Bois%C3%A9-Lac-Beauport-QC-sothebys-international-realty-canada-1440x570.jpg
(when I copy the URL)
From https://github.com/ruby/uri/issues/40#issuecomment-1436138080
No, a URI path is not allowed to contain arbitrary UTF-8 characters. Non-ASCII UTF-8 characters must be percent encoded, and even some ASCII characters must be percent encoded.
and if you would try to parse that URL using Ruby uri
, it would blow up
irb(main):017:0> URI("https://sothebysrealty.ca/insightblog/wp-content/uploads/2020/03/Featured-62-Ch-du-Boisé-Lac-Beauport-QC-sothebys-international-realty-canada-1440x570.jpg")
/Users/dentarg/.arm64_rubies/3.2.2/lib/ruby/3.2.0/uri/rfc3986_parser.rb:20:in `split': URI must be ascii only "https://sothebysrealty.ca/insightblog/wp-content/uploads/2020/03/Featured-62-Ch-du-Bois\u00E9-Lac-Beauport-QC-sothebys-international-realty-canada-1440x570.jpg" (URI::InvalidURIError)
from /Users/dentarg/.arm64_rubies/3.2.2/lib/ruby/3.2.0/uri/rfc3986_parser.rb:71:in `parse'
from /Users/dentarg/.arm64_rubies/3.2.2/lib/ruby/3.2.0/uri/common.rb:193:in `parse'
from /Users/dentarg/.arm64_rubies/3.2.2/lib/ruby/3.2.0/uri/common.rb:722:in `URI'
I would like to re-open this issue. I think there is a misunderstanding.
There is (at least) 2 ways to represent the letter « é » :
That how « http://ferrisson.com/wp-content/uploads/2014/04/2014-04-1-P.-P.-Côté-2.0-M.jpg » is correctly convert to « http://ferrisson.com/wp-content/uploads/2014/04/2014-04-1-P.-P.-Co%CC%82te%CC%81-2.0-M.jpg »
Google Chrome is converting http://ferrisson.com/wp-content/uploads/2014/04/2014-04-1-P.-P.-Côté-2.0-M.jpg
to http://ferrisson.com/wp-content/uploads/2014/04/2014-04-1-P.-P.-C%C3%B4t%C3%A9-2.0-M.jpg
for me
The link is an image coming from http://ferrisson.com/pierre-paul-cote-csq/
According to my Google Inspector, the link should be converted to http://ferrisson.com/wp-content/uploads/2014/04/2014-04-1-P.-P.-Co%CC%82te%CC%81-2.0-M.jpg
I also post Safari Inspector since the conversion is more obvious.
That how « http://ferrisson.com/wp-content/uploads/2014/04/2014-04-1-P.-P.-Côté-2.0-M.jpg » is correctly convert ...
I copied from your message here on GitHub when I got http://ferrisson.com/wp-content/uploads/2014/04/2014-04-1-P.-P.-C%C3%B4t%C3%A9-2.0-M.jpg
I can also see the source code on http://ferrisson.com/pierre-paul-cote-csq/ referencing http://ferrisson.com/wp-content/uploads/2014/04/2014-04-1-P.-P.-Côté-2.0-M.jpg
and when I enter that into the address bar in Google then image loads and if I copy the URL from the address bar it is http://ferrisson.com/wp-content/uploads/2014/04/2014-04-1-P.-P.-Co%CC%82te%CC%81-2.0-M.jpg
I copied from your message here on GitHub when I got
http://ferrisson.com/wp-content/uploads/2014/04/2014-04-1-P.-P.-C%C3%B4t%C3%A9-2.0-M.jpg
To be more clear, right click on the URL and "Copy Link Address" gave me http://ferrisson.com/wp-content/uploads/2014/04/2014-04-1-P.-P.-C%C3%B4t%C3%A9-2.0-M.jpg
Anyway, even if Chrome is supporting more representations I'm not sure we can do that in Addressable (see the previous comments in the thread)
Hello guys, I got issue when I parse special character like é with this code
Addressable::URI.parse(url).normalize
it will change é into %C3%A9 and this caused an error.what I need is to parse it into UTF-8 Format become
e%CC%81
is is possible with this gem ?
example url:
https://sothebysrealty.ca/insightblog/wp-content/uploads/2020/03/Featured-62-Ch-du-Boisé-Lac-Beauport-QC-sothebys-international-realty-canada-1440x570.jpg
Instead of this current parsing condition:
https://sothebysrealty.ca/insightblog/wp-content/uploads/2020/03/Featured-62-Ch-du-Bois%C3%A9-Lac-Beauport-QC-sothebys-international-realty-canada-1440x570.jpg
-> wrongit should be become this UTF-8 format:
https://sothebysrealty.ca/insightblog/wp-content/uploads/2020/03/Featured-62-Ch-du-Boise%CC%81-Lac-Beauport-QC-sothebys-international-realty-canada-1440x570.jpg
-> correct