google / guava

Google core libraries for Java
Apache License 2.0
50.28k stars 10.92k forks source link

Decoding functionality for Escaper #1691

Open gissuebot opened 10 years ago

gissuebot commented 10 years ago

Original issue created by cow...@bbs.darktech.org on 2014-03-09 at 06:20 AM


Please add functionality that reverses the effects of Escaper.escape(String).

Use-case: When converting from URL to URI, we need to decode the URL components before passing them into the URI constructor to be re-encoded. See http://stackoverflow.com/a/22279061/14731 for an example.

gissuebot commented 10 years ago

Original comment posted by cpovirk@google.com on 2014-03-11 at 07:11 PM


Is the use case served by the JDK URLDecoder method used in the StackOverflow answer?

gissuebot commented 10 years ago

Original comment posted by cow...@bbs.darktech.org on 2014-03-11 at 10:24 PM


I have a feeling I know what's going to happen once I answer this... :)

Yes, the use-case is served by the JDK URLDecoder method. I don't have an immediate use-case for decoding the other Escaper implementations but perhaps others do...?

gissuebot commented 10 years ago

Original comment posted by cpovirk@google.com on 2014-03-12 at 04:02 PM


I have a feeling I know what's going to happen once I answer this... :)

:)

Yes, the use-case is served by the JDK URLDecoder method. I don't have an immediate use-case for decoding the other Escaper implementations but perhaps others do...?

Our general thinking here is that a proper URL class ought to handle unescaping as part of parsing, similar to what parsers do for HTML, etc. Such a URL class has been on our to-do list forever. See issue 1005.


Labels: Type-Addition, Package-Escape

gissuebot commented 10 years ago

Original comment posted by cow...@bbs.darktech.org on 2014-03-12 at 11:47 PM


Jersey's UriBuilder does an excellent job in that department, but there are still a few use-cases it fails to handle nicely.

Imitation is the highest form of flattery :) Have you considered "extending" their UriBuilder design?

gissuebot commented 10 years ago

Original comment posted by cpovirk@google.com on 2014-03-13 at 03:08 AM


Would that be this one? https://jersey.java.net/apidocs/2.0/jersey/javax/ws/rs/core/UriBuilder.html

(In this case, we do have one internally. The problem is that it has a flaw or two. We'd like to fix it up rather than introduce a new one, but the impact on existing users (who might rely on, e.g., the exact hash generated by a UriBuilder usage) makes the fix difficult.)

gissuebot commented 10 years ago

Original comment posted by cow...@bbs.darktech.org on 2014-03-13 at 04:15 AM


Yes, that's the one.

"We do have one internally": What do you mean internally? Internally at Google?

"who might rely on, e.g., the exact hash generated by a UriBuilder usage": since when does the specification guarantee the exact hash generated? Isn't it a code smell if users need to depend on this kind of (implementation-specific) information?

gissuebot commented 10 years ago

Original comment posted by cpovirk@google.com on 2014-03-13 at 01:00 PM


Yeah, sorry, internally at Google.

And the hash they're relying on isn't hashCode(); it's the output of some well defined hashing algorithm that they run against the URL string. Consequently, if we change the URL string by improving our class's escaping behavior, the hash will change, and their code will break.

gissuebot commented 10 years ago

Original comment posted by cow...@bbs.darktech.org on 2014-03-13 at 06:08 PM


Couldn't you release the fixed version to the public, and apply a "normalization" process inside Google before passing it on to the hashing algorithm?

Meaning, I would expect you to retain backwards compatibility by modifying the wrapper code (the code that invokes the UriBuilder and passes the result to a hashing algorithm), as opposed to holding UriBuilder's implementation back.

Are you saying there isn't a deterministic way to convert from the "fixed" format back to the format expected by the hashing function?

gissuebot commented 10 years ago

Original comment posted by cpovirk@google.com on 2014-03-13 at 06:14 PM


Something like that should work. The challenge is in identifying all the lines of Google code that depend on the exact URL. This can be more difficult than I've made it out to be: Some callers might build a URL and send it via RPC to an entirely different service, and only in that different service is the URL hashed.

gissuebot commented 10 years ago

Original comment posted by cow...@bbs.darktech.org on 2014-03-13 at 06:29 PM


If you normalize it as the source before sending it out via RPC you should be safe. Meaning: Find Usages on your UriBuilder class and always normalize the result regardless of where it goes. You could always optimize this later by removing normalization where it's not necessary.

gissuebot commented 10 years ago

Original comment posted by cpovirk@google.com on 2014-03-13 at 06:33 PM


We'd be talking a 5-digit number of callers here. I'd additionally worry that new users within an app might omit the normalization call. We'd almost need to force users to choose which escaping policy they want.

I'm beginning to wonder if we should just release a UriBuilder that has the correct behavior in Guava and leave the one inside Google with the old behavior.

gissuebot commented 10 years ago

Original comment posted by cow...@bbs.darktech.org on 2014-03-13 at 07:36 PM


How about having the Google UriBuilder wrap/extend the Guava UriBuilder implementation? This way your code keeps on running without changing any references, and Guava users still benefit.

gissuebot commented 10 years ago

Original comment posted by cpovirk@google.com on 2014-03-13 at 07:44 PM


That could work.

dnouls commented 7 years ago

I could use a parse method to do the opposite of a custom Escaper. And I mean this in a generic way, not for parsing URL's or URI's as the author initially requested.

I have some custom encoding of arrays in an i18n bundle (in a GWT application) and I want to unescape the texts between the split characters. (But again just a simple use case, their might already be alternative solutions)

KilianB commented 6 years ago

I could also need a unescape method to retrieve the original uri. The URI is saved in a database, afterwards escaped and send to a network device which in turn will notify the application via an eventhandler supplying the escaped uri. To lookup additional information from the db the escaping needs to be reversed.