Path encoding issue after upgrade from 0.9.6 to 1.0.1

aaronjwhiteside commented 3 years ago

From reading the documentation it seems that paths should be url encoded by Karate.

This example works in 0.9.6 but fails after upgrading to 1.0.1.

Feature: Test
  Background:
    * def demoUrl = 'http://httpbin.org'
    * def scope = 'root|'

  Scenario: Get scope
    Given url demoUrl
    And path 'example', 'path', 'v1', 'scopes', scope
    When method GET

with the following error:

10:02:33.618 [main] ERROR com.intuit.karate - Illegal character in path at index 46: http://httpbin.org/example/path/v1/scopes/root|, http call failed after 2 milliseconds for url: http://httpbin.org/example/path/v1/scopes/root|
10:02:33.619 [main] ERROR com.intuit.karate - src/test/java/tabapay/test2.feature:9

Looking at the spec, https://tools.ietf.org/html/rfc3986, the character | does not appear to be special?

Changing the scope variable to be 'root%7C' seems to fix the problem and karate is happy, but | is not a special character for URLs, and this works fine in 0.9.6..

log output from 0.9.6

10:11:53.887 [ForkJoinPool-1-worker-1] DEBUG com.intuit.karate - request:
1 > GET http://httpbin.org/example/path/v1/scopes/root%7C
1 > Accept-Encoding: gzip,deflate
1 > Connection: Keep-Alive
1 > Host: httpbin.org
1 > User-Agent: Apache-HttpClient/4.5.12 (Java/1.8.0_252)

I can provide a full minimal project if requested, but I feel this should be enough to reproduce, let me know and I'll attach it to the ticket.

ptrthomas commented 3 years ago

@aaronjwhiteside I'm still confused with your understanding of path vs raw path it seems to be in reverse to what my understanding is at times when I read all the comments above. can we agree on the below summary

path will url-encode everything except /
unless the / is prefixed with \
existing users who have used /foo/bar in path will not need to change tests
I'd like to rename raw path to pathEncoded
and pathEncoded will use what is passed as-is, and never encode
both path and pathEncoded will support arrays as arguments
I will add a karate.urlEncode() and karate.urlDecode() helper for general use

aaronjwhiteside commented 3 years ago

I'm still confused with your understanding of path vs raw path it seems to be in reverse to what my understanding is at times when I read all the comments above.

@ptrthomas Yeah I was expecting this..

At the time I was implementing the raw path logic, I didn't pay enough attention to how the path being passed in would be parsed by the URIBuilder, but after having some time to reflect and thoroughly read the code/javadoc.. I realised it's almost exactly the same as the previous 0.9.6 logic.

https://javadoc.io/doc/org.apache.httpcomponents/httpclient/latest/org/apache/http/client/utils/URIBuilder.html#setPath(java.lang.String)

Sets URI path. The value is expected to be unescaped and may contain non ASCII characters.

It's expected to be unescaped because it parses the path and splits it by the path separator into segments, stored internally, and when building the final url it does encoding of a subset of what it considers special characters, for example | in the path is encoded as %7C, this happens to be the same behaviour as 0.9.6, though I haven't done extensive tests to verify the exact differences, if any.

This obviously wasn't my intention, I wanted it to just accept whatever I passed in as the raw path and not touch it in any way.

But if you just wanted to keep things simple and get back to how path worked in 0.9.6, reusing the current raw path logic as the path logic I think would do this..

can we agree on the below summary

path will url-encode everything except / existing users who have used /foo/bar in path will not need to change tests

I still think it should encode / too, but for the sake of getting something accepted, I'm not going to fight you on this one. :)

unless the / is prefixed with \

I think I get your comment now about inverse, my take on prefixing something with \ is that is skips encoding.. not forces encoding.. but that's just from how things like regex and strings in java work, If you want something to be treated as verbatim and not have the normal logic applied to it that might be applied by its very nature, you would force it to be escaped.

So using an escape prefix to force something to be encoded, aka treated differently, vs treated as verbatim, doesn't make sense to me, as it goes against prior experience with escaping. And I suspect it would with others too? But maybe this can just be a quirk of Karate, because sometimes things are just not that simple. shrug

Having said all that, I'm willing to accept / only being encoded when prefixed with \.

I think internally we can split the path into segments, so that Given path '/foo/bar', hello (where * def hello = 'world') is represented internally as an array ['foo', 'bar', 'world'] and rendered as /foo/bar/world.

I'd like to rename raw path to pathEncoded

This still feels ambiguous to me:

It kind of implies that the path will be encoded?
Or that the path is expected to already be encoded?

Maybe pathUnencoded or unencodedPath makes the distinction clearer. raw path I thought was a good attempt at conveying that, but really I'm happy with whatever as long as it's not ambiguous.

and pathEncoded will use what is passed as-is, and never encode

Agree!

On a technical note, to really keep this path untouched, we might have to drop using URIBuilder and go back to building the URL by hand.

both path and pathEncoded will support arrays as arguments

Agree!

I will add a karate.urlEncode() and karate.urlDecode() helper for general use

So long as it correctly encodes for URLs. Remembering that the JDK's URLEncoder doesn't quite do this, as it's intended for url encoded form bodies, which have slightly different rules to actual URLs.

krstcs commented 3 years ago

OK, I think I see the issue. Forward slash would be backwards from every other special character in my use-case because it would NOT be encoded normally, but would only be encoded if it was escaped, but then forward-slash IS a special case in URI encoding anyway so it makes sense (to me???) that it is treated differently. Every other special character would be encoded UNLESS it was escaped, forward-slash would be encoded if it IS escaped. Escaping should make the character be handled the reverse of how it's normally handled.

Now, again, I understand the issue that we have here, because one character IS being treated differently than all the rest. But isn't that always the case in URIs??

I still don't see the need for pathEncoded though. I think it's more fluff that isn't necessary if we treat ONLY forward-slash as a special case for escaping.

The other option would be to encoded EVERY special character (so != [a-zA-Z0-9]) that isn't escaped, which would force us to then change all forward-slashes in all tests to be escaped, that doesn't seem to be what we want does it?

zgael commented 3 years ago

Hey there, writing my thoughts on the whole path encoding issue after a test I reported in issue #1635

IMHO, there need to be a way to encode URLs, but it should NOT be the default. See below example, whichever is easier to read ?

Given url 'http://httpbin.org'
And path 'anything/with/a/long/path/but/no/weird/characters'

looks way more readable to me than :

Given url 'http://httpbin.org'
And path 'anything', 'with', 'a', 'long', 'path', 'but', 'no', 'weird', 'characters'

Plus, the former is copy-pastable from a browser/curl command, which is quite handy to write a test easily.

So I would either :

make path smart, able to find out if he needs to encode characters or not (defaulting to "do not encode / chars, as they are part of the URL")
create alternate keywords, like @ptrthomas suggested with pathEncoded and urlEncoded

These are just my two cents, but I'd be happy to discuss if people disagree with this.

ptrthomas commented 3 years ago

to everyone subscribed here, I've made changes to bring back the old behavior, there will be no breaking change.

expect a 1.1.0.RC3 shortly - but it would be great if anyone can test in the meantime following the developer guide

as of now, you can get the http client to encode a forward-slash by doing this:

* path '/hello\\\\/world'

which is kinda hilarious but I don't care anymore lol. haven't dug into it but my guess is most servers will process the %2F as a forward-slash anyway, but maybe this weirdness is worth it if you want to test that your server is indeed doing the right thing

EDIT: no more raw path, people can use the url if needed

karate.urlEncode() and karate.urlDecode() have been added.

hlemmur commented 3 years ago

In case if it helps anyone in the light of the changes above for the case with the json payload in a request body (was struggling trying to get it working with 1.1.0.RC3 until found this thread):

before (1.1.0.RC2 and earlier)

    Given url endpoint
    And path userId, 'decision/'
    And request {caseId : '123', decision: 'approve'}
    When method PUT
    Then status 200

after (1.1.0.RC3)

    Given url endpoint
    And path userId, 'decision\\\/'
    And request {caseId : '123', decision: 'approve'}
    When method PUT
    Then status 200

Couldn't find less ugly (and requiring minimal refactoring) workaround.

ptrthomas commented 3 years ago

@hlemmur can you explain that more. if your server doesn't work unless the URL ends with a / it is a bug on your server. or is this a "negative" test case ? do you suggest any other behavior ? and does this not work if you build a url without using path ?

hlemmur commented 3 years ago

@ptrthomas no, it's not a negative case. Checked the source code of the requested service, the request handler is defined with a trailing / like

    @ApiOperation(value = "some operation",
            httpMethod = "PUT",
            consumes = "application/json")
    @PutMapping(value = "/{foo}/bar/")
    ...

Trailing / is being cut off even if I put the path into the url, so no difference but it requires more efforts on refactoring.

Though I think my case is a special case, so I don't mind to use \\\/ or %2F to encode the trailing /. It's just confusing that 1.1.0.RC3 is announced with 'no breaking changes', but in fact they are, comparing to RC2.

To me the more consistent behaviour would be to consider adding the trailing / after each path segment including the final one, but not sure if it doesn't break anything else.

ptrthomas commented 3 years ago

@hlemmur just made the change so at least the url preserves a trailing slash. I still think a server should accept both /foo/ and /foo but anyway. one of the intents for path is to save you the trouble of having to think of / you can ignore them completely and use variables (comma delimited etc).

so yes, in your case I think the escaping is fine. and a rare case, so I don't plan to make any changes to the docs (PR-s are welcome as always)

ptrthomas commented 2 years ago

an update for all watching this thread, the next version will support preserving a trailing slash even when you use path refer #1863

karatelabs / karate

Path encoding issue after upgrade from 0.9.6 to 1.0.1 #1561