metafacture / metafacture-fix

Work in progress towards an implementation of the Fix language for Metafacture
Apache License 2.0
6 stars 2 forks source link

Add uri_encode function (#273) #285

Closed dr0i closed 1 year ago

dr0i commented 1 year ago

Resolves #273.

blackwinter commented 1 year ago

In general, I would prefer to use the Metamorph implementation directly (see e.g. #230 and #231). If it doesn't provide a suitable interface it should be added there and then reused for both. Eventually, org.metafacture.metamorph.functions might be generalized and moved to org.metafacture.commons.functions (or something like that).

dr0i commented 1 year ago

In general, I would prefer to use the Metamorph implementation directly

good point. Agreed. Awaiting ok for using percent encoder and then migrate function to metafcature-core.

We'd also have to add documentation on the new function in the README.

Right - forgot about that. Would do this also im metfacture-core. Will note the RFC 3986 compliance there.

fsteeg commented 1 year ago

One thing leading to confusion here (at least for me) is that we actually need something to encode a string to be used in a URI (in particular as a path segment), not to encode a full URI. So for http://perl.com/foo bar we don't want http://perl.com/foo%20bar (which would be a valid URI), but we want http%3A%2F%2Fperl.com%2Ffoo%20bar (which can be used as a path segment in a URI, but is not itself a valid URI).

Now if I understand it correctly, both metamorph.functions.URLEncode and Catmandu::Fix::uri_encode basically do that, but Java's URLEncoder (which is currently used in metamorph.functions.URLEncode) uses + instead of %20 for blanks, which is a problem for using it in lobid-resources.

To both avoid breaking the current metamorph.functions.URLEncode behavior and reusing as much implementation as possible, this is what @dr0i and I just discussed: we could base metamorph.functions.URLEncode on PercentEscaper, with an option to pass the safeChars config. In the default used in metamorph, we use SAFECHARS_URLENCODER, which "mimics the behavior of java.net.URLEncoder" (thus no break), and in metafix we use e.g. SAFE_PLUS_RESERVED_CHARS_URLENCODER (to get what we need in lobid-resources and be consistent with Catmandu::Fix::uri_encode).

dr0i commented 1 year ago

This PR is substituted by https://github.com/metafacture/metafacture-fix/pull/314. Closing.