hydrusnetwork / hydrus

A personal booru-style media tagger that can import files and tags from your hard drive and popular websites. Content can be shared with other users via user-run servers.
http://hydrusnetwork.github.io/hydrus/
Other
2.28k stars 148 forks source link

Hydrus should not decode percent encoded strings in GUGs #868

Open floogulinc opened 3 years ago

floogulinc commented 3 years ago

I'm not quite sure where in the pipeline this is actually happening but if you use a verbatim percent encoded string in a GUG input it actually gets decoded back into regular characters in the URL which causes unintended effects. If a user is using things with percent encoding in the GUG input its likely they intend for that exact string to end up in the resultant URL. An example of this being an issue can be seen in https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/issues/98.

hydrusnetwork commented 2 years ago

Thank you for this report. Unfortunately a bad decision when I originally started URL storage was to store 'pretty' decoded URLs rather than percent encoded URLs and decode for user presentation. This has caused all sorts of difficulty throughout the downloader system as I have bodged and attempted to figure out the correct encoding for a parameter at each stage. '+' is a particularly difficult character since it does not have to be encoded in a URL but some sites want it encoded in complicated parameters. This is the 'blonde+blue_eyes' vs '6+girls+blue_eyes' (actually should be '6%2Bgirls+blue_eyes') issue.

I am not sure I can support this universally until I update the network engine to handle URLs decoded at all times internally, and perhaps it will also need additional GUG or URL Class options, if different sites handle it differently.

I apologise for dealing with this issue late. In my tests here, the above referred issue seems to no longer be true? When I do some tests with '18%2B' input, it seems to stay as that, while '18+' stays as that in the final URL.

In any case, I am afraid that while I can hack some specific solutions, a universal solution here I think will have to wait for better internal encoding.