Path encoding issue after upgrade from 0.9.6 to 1.0.1

aaronjwhiteside commented 3 years ago

From reading the documentation it seems that paths should be url encoded by Karate.

This example works in 0.9.6 but fails after upgrading to 1.0.1.

Feature: Test
  Background:
    * def demoUrl = 'http://httpbin.org'
    * def scope = 'root|'

  Scenario: Get scope
    Given url demoUrl
    And path 'example', 'path', 'v1', 'scopes', scope
    When method GET

with the following error:

10:02:33.618 [main] ERROR com.intuit.karate - Illegal character in path at index 46: http://httpbin.org/example/path/v1/scopes/root|, http call failed after 2 milliseconds for url: http://httpbin.org/example/path/v1/scopes/root|
10:02:33.619 [main] ERROR com.intuit.karate - src/test/java/tabapay/test2.feature:9

Looking at the spec, https://tools.ietf.org/html/rfc3986, the character | does not appear to be special?

Changing the scope variable to be 'root%7C' seems to fix the problem and karate is happy, but | is not a special character for URLs, and this works fine in 0.9.6..

log output from 0.9.6

10:11:53.887 [ForkJoinPool-1-worker-1] DEBUG com.intuit.karate - request:
1 > GET http://httpbin.org/example/path/v1/scopes/root%7C
1 > Accept-Encoding: gzip,deflate
1 > Connection: Keep-Alive
1 > Host: httpbin.org
1 > User-Agent: Apache-HttpClient/4.5.12 (Java/1.8.0_252)

I can provide a full minimal project if requested, but I feel this should be enough to reproduce, let me know and I'll attach it to the ticket.

ptrthomas commented 3 years ago

@aaronjwhiteside closing as won't fix. we just use apache behind the scenes. you are welcome to provide a PR to fix it

aaronjwhiteside commented 3 years ago

@ptrthomas I'm ok with taking a stab at fixing this.

But I just want to make sure I'm fixing the right thing..

From the README.md

REST-style path parameters. Can be expressions that will be evaluated. Comma delimited values are supported which can be more convenient, and takes care of URL-encoding and appending '/' where needed.

I take this to mean that if I use the Comma delimited style that every "part" of that path should be escaped.

For example:

Given path 'abc', '/', '123|', '&', '?', '='

Should come out like this: abc/%2F/123%7C/%26/%3F/%3D

But

Given path 'abc//123|&?='

Should be left alone? I have a feeling that isn't right.

Because

and takes care of URL-encoding and appending

is part of the sentence that talks about comma delimited values, it can be taken to imply that URL-encoding only happens when using comma delimited values. And not when using a single value.

Looking at the code I don't see anything that treats comma separated paths specially, vs non comma separated, because everything seems to end up as a List. https://github.com/intuit/karate/blob/master/karate-core/src/main/java/com/intuit/karate/core/ScenarioEngine.java#L365-L366

I even see code that assumes comma separated parts can have trailing slashes that are left in place.. and not urlencoded? https://github.com/intuit/karate/blob/master/karate-core/src/main/java/com/intuit/karate/http/HttpRequestBuilder.java#L275-L277

If my interpretation of the README.md is correct, then my approach would be:

For each element of the path List, url encode it and concatenate it using /

But there are difficulties when dealing with trailing slashes in this case, questions like: was the slash in the part intended to be url encoded or to cap off the path? Maybe slashes are the one thing that are excluded from url encoding?

I suspect this is probably "correct" in one sense, but may break for a few people..

Another approach might be to maintain a list of characters that the apache http client doesn't think are valid in the URIs it accepts and just url encode them. I think that's probably less "correct" but much less likely to break people's existing usage of the path step.

ptrthomas commented 3 years ago

@aaronjwhiteside all this is fine. in 1.X we decided to simplify what "http client" implementations need to handle and all the processing that the http-request-builder does ends up as a HttpRequest object.

now in this object, the path-chunks and even query-string-params have already been boiled into a single string url field. and then this is passed to apache or whatever which I would have thought would do the right thing and url-encode all the things. maybe this design is faulty and the HttpRequest object should retain all the path, chunks - I don't know - it sounds overly complicated and maybe you can find something elementary the current code is missing. as I said - if apache is not encoding things correctly, not sure if fixing that is worth it unless it is a simple config-tweak etc

ptrthomas commented 3 years ago

@aaronjwhiteside one more thing, I'm open to adding keywords like pathEncoded and urlEncoded to complement path and url - as a way to "bring your own encoding" - perhaps that may simplify the rules ?

aaronjwhiteside commented 3 years ago

@ptrthomas So I think I have a grasp on it now. The problem is that here we're expecting Apache Http Client to know how to encode a full url we pass to it.

But it expects the URL to already be encoded, and performs no translation/encoding on it.

They provide a utility to help users in this regard: https://javadoc.io/doc/org.apache.httpcomponents/httpclient/latest/org/apache/http/client/utils/URIBuilder.html

This makes sense if you think about the following scenarios:

http://some.domain/some/path?some==query

Is the second = part of the query key or query value?

The same issue exists for other examples like this:

http://some.domain/some///path?hello&world

Is the middle / in /// the name of a folder or just another superfluous path separator? The same for the ?hello&world is & part of the query parameter name or just the query parameter separator?

The Apache Http Client cannot know our intent here and so expects it to already be encoded correctly.

Now on to a possible solution:

I think the url step would remain as is, Karate shouldn't try and escape anything, this allows users to build special/weird and even erroneous requests.
The path step should provide sanity and escape everything passed to it as individual segments/parts.

I've also got a solution to the trailing / problem, an empty last segment.

Given path 'hello', 'world', ''

Should produce hello/world/

So question for you, it looks like the https://github.com/intuit/karate/blob/master/karate-core/src/main/java/com/intuit/karate/http/HttpRequest.java and https://github.com/intuit/karate/blob/master/karate-core/src/main/java/com/intuit/karate/http/HttpRequestBuilder.java are intended to be abstractions over any implementation of the HttpClient interface.

How do you feel about using https://javadoc.io/doc/org.apache.httpcomponents/httpclient/latest/org/apache/http/client/utils/URIBuilder.html inside Karate's HttpRequestBuilder?

If not, and we modify the HttpRequest to contain all the individual path segments, same goes for query parameters, fragments, base url, and remaining constituents of the URL etc.. Then HttpRequestBuilder contains no logic around building the URL and all of that is moved into the specific HttpCient implementation class.

Personally I'm in favour of just putting the logic into the ApacheHttpClient class and making the HttpRequest a simple value holder for anything that's used to create the URL.

But we could of course recreate Apache's URIBuilder inside Karate's HttpRequestBuilder.

Let me know..

ptrthomas commented 3 years ago

reopening. thanks for the research, that helps. agree that url should remain as-is and path should always encode. there is a question about what if users append the forward-slash via path which unfortunately is quite common, many users don't realize that the comma-delimited way is preferred. so not very sure about the empty trailing path idea, I would keep it simple - or which is why we can consider a new pathEncoded keyword

I don't mind using URIBuilder inside the http-request-builder, we decided to hard-depend on apache anyway in version 1.0 - I would consider using netty utilities btw, netty is IMO the lib we will always depend on - and in the future apache may need to be swapped out e.g. for http2/3 and async, but not a big deal

IIRC you can perform the url-encoding + path-building using the java URL class

aaronjwhiteside commented 3 years ago

I think if users append the forward slash to the first path segment, we continue to strip it off as is done right now, just to keep that part backwards compatible.

The trailing slash is actually an artifact of how the current code is written and is the recommended way when using Apache's URIBuilder to append a trailing slash without having it be escaped by appending it to the end of the last path segment.

I don't mind using URIBuilder inside the http-request-builder,

Ok, even better by me, I can use that there and keep HttpClient as is.

IIRC you can perform the url-encoding + path-building using the java URL class

I just checked this and URI and URL only provide ways to pass the the full path as a single argument.

https://docs.oracle.com/javase/7/docs/api/java/net/URI.html https://docs.oracle.com/javase/7/docs/api/java/net/URL.html

I would consider using netty utilities btw

Agree, I'll take a look and see what netty provide in this regard.

ptrthomas commented 3 years ago

@aaronjwhiteside all good, so a PR with some extra tests should be fine. I just worry about teams I've seen doing things like * path 'foo/bar/baz'

ptrthomas commented 3 years ago

@aaronjwhiteside thinking more about my example * path 'foo/bar/baz' I don't mind us forcing a breaking change in the next version so that this stops working, this has always bugged me

aaronjwhiteside commented 3 years ago

hi @ptrthomas I've opened a draft PR, https://github.com/intuit/karate/pull/1565

Still working through some issues:

karate-demo passes, but karate-mock-servlet fails

[ERROR] Failures: 
[ERROR]   MockSpringMvcServletTest.testSpringBootDemo:51 match failed: EQUALS
  $ | not equal (STRING:STRING)
  'ï¿½Ill~Formed@RequiredString!'
  '�Ill~Formed@RequiredString!'

classpath:demo/encoding/encoding.feature:8
match failed: EQUALS
  $ | not equal (STRING:STRING)
  'ï¿½Ill~Formed@RequiredString!'
  '�Ill~Formed@RequiredString!'

classpath:demo/encoding/encoding.feature:15
[INFO] 
[ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0

ptrthomas commented 3 years ago

@aaronjwhiteside yes, that was in response to a very old issue raised - and since I've felt that we may be doing "too much"

take a look at the server-side unpacking of the request which uses armeria / netty and you can see what I mean, and I did that in a hurry: https://github.com/intuit/karate/blob/7cce901cdcf90d1718f51b94f5bc737859350063/karate-core/src/main/java/com/intuit/karate/http/Request.java#L187

so we don't need to treat that test-case as the "truth" - but do what makes sense to us

aaronjwhiteside commented 3 years ago

@ptrthomas OK, I'll take a look tomorrow, getting late here :)

aaronjwhiteside commented 3 years ago

so we don't need to treat that test-case as the "truth" - but do what makes sense to us

@ptrthomas

So the issue turns out to be that http servlet engines are required to decode the url before passing it to servlets..

A quick hack of, in Request.java gets past that particular issue but causes other problems.

It's not trivial to decode a url, some things need to be decoded but essentially ignored by the servlet engine, the case of / is a good example, a servlet engine is not supposed to pass %2F to the servlet, but at the same time it's not supposed to interpret the decoded / as a path separator as it was specifically escaped. So it should not interfere with the path routing/dispatch logic to the servlets. The same thing goes for all the other characters with special meaning, like ?, &, = etc..

The naive approach of using URLDecoder doesn't take these things into account.

So for the moment I'll keep digging, otherwise I've updated the PR with the changes discussed in the PR comments.

    public void setUrl(String url) {
        // http servlet engines are required to decode the URL before passing to any servlets
        // this solution isn't ideal, as URLDecoder really deals with x-www-form-urlencoded decoding,
        // but it's good enough for our mock http servlet.
        try {
            urlAndPath = URLDecoder.decode(url, StandardCharsets.UTF_8.name());
        } catch (UnsupportedEncodingException e) {
            throw new RuntimeException(e);
        }
        StringUtils.Pair pair = HttpUtils.parseUriIntoUrlBaseAndPath(url);
        urlBase = pair.left;
        QueryStringDecoder qsd = new QueryStringDecoder(pair.right);
        setPath(qsd.path());
        setParams(qsd.parameters());
    }

ptrthomas commented 3 years ago

@aaronjwhiteside thanks, the changes look ok and thanks for the readme edits. I'll wait for the ci to be green then

cc @joelpramos for a second opinion, especially about the proposal to break users who have mixed forward-slashes into path usage.

aaronjwhiteside commented 3 years ago

@ptrthomas

Just pushed a new commit.

Long story short tracked down the issue to Spring, it was decoding the path info using Latin-1 instead of UTF-8.

Two changes, 1) in the mock servlet case, don't allow spring to decode the pathInfo, we set it explicitly. 2) in the normal spring boot + tomcat case, don't use @PathVariable that too was being decoded using Latin-1 instead of UTF-8. 2.1) because the mock and real servlet environments are quite different, we needed to check two places for the decoded path information.

Some references: https://stackoverflow.com/questions/966077/java-reading-undecoded-url-from-servlet https://stackoverflow.com/questions/4931323/whats-the-difference-between-getrequesturi-and-getpathinfo-methods-in-httpservl/42706412

ptrthomas commented 3 years ago

@aaronjwhiteside great, thanks for the research. if possible - can you suggest a way teams can scan their existing feature files to see if any usage of path has a forward-slash mixed in on the right, which can be part of the release notes. maybe a simple regex find in the IDE etc

aaronjwhiteside commented 3 years ago

Something like this should work:

.* path.*('\/.*)+

joelpramos commented 3 years ago

I have mixed feelings about this change - primarily as I find comma separated paths counter intuitive as an HTTP path can definitely have forward slashes (regardless of interpretation for RESTful this is observed with SOAP and just in general across the internet).

My feeling is that the impacted users of this change will vastly exceed the number of users with needs for special encoding.

Would we be able to apply this change only when there is a List of paths (provided by comma separated list or when the keyword shows multiple times) and documenting that if you have special needs for encoding you should use that alternative?

ptrthomas commented 3 years ago

@joelpramos agree. how about we introduce a new keyword called pathParams that will do things the "strict" way

aaronjwhiteside commented 3 years ago

@joelpramos

primarily as I find comma separated paths counter intuitive as an HTTP path can definitely have forward slashes (regardless of interpretation for RESTful this is observed with SOAP and just in general across the internet).

Trying to understand this statement, are you saying that a URL of http://something.com/path//otherpath/finalpath is valid?

The changes to karate don't preclude that,

   Given path 'path', '', 'otherpath', 'finalpath'

My feeling is that the impacted users of this change will vastly exceed the number of users with needs for special encoding.

Without correct encoding, I feel Karate's path step is meaningless? Without encoding all it provides is a sugar syntax over + for string concatenation?

And if you don't encode anything, how can we say it's even correct to append / between the items in the supplied list? When does it become acceptable to append / and not append /, when the string in the list already starts with a / we omit it?

I would like to see clear examples of how you think it should work, this would help me understand the use cases.

Would we be able to apply this change only when there is a List of paths (provided by comma separated list or when the keyword shows multiple times) and documenting that if you have special needs for encoding you should use that alternative?

Can you please show examples of use cases and how they would behave?

joelpramos commented 3 years ago

@aaronjwhiteside

primarily as I find comma separated paths counter intuitive as an HTTP path can definitely have forward slashes (regardless of interpretation for RESTful this is observed with SOAP and just in general across the internet).

Trying to understand this statement, are you saying that a URL of http://something.com/path//otherpath/finalpath is valid?

The changes to karate don't preclude that,
   Given path 'path', '', 'otherpath', 'finalpath'

So starting from the basics I don't think most people need to worry about URL encoding (if we consider most users use Karate for API testing of SOAP or REST APIs and there's a param keyword already for GET requests). This is obviously not 100% accurate for paths like http://something.com/books/id-of-book-with-?-specialcharacter. I'd say that in general this is mostly common in PATCH/PUT/DELETE operations and the parameter of the URL is an ID and it's uncommon to have IDs with special characters (typically are numeric or alphanumeric uuids).

I think most people look at a URL and can distinguish that the URL is the first part of the string (http://something.com) regardless of concepts of a servlet or path etc. and the "rest" of the URL is the path. So if the path is 'path/otherpath/finalpath' or 'books/id-of-book-with-?-specialcharacter' the most natural association would be:

* path 'path/otherpath/final'

instead of

* path 'path', '', 'otherpath', 'finalpath'

My feeling is that the impacted users of this change will vastly exceed the number of users with needs for special encoding.

Without correct encoding, I feel Karate's path step is meaningless? Without encoding all it provides is a sugar syntax over + for string concatenation?

I actually think the string concatenation with + is bad idea - after all you will only face the encoding issues if you are passing variables into the servlet. Never seen a servlet deployed in a path that has "non standard" characters. That would require a function to be added that encoded URL and probably not the direction for Karate (i.e. adding more functions) but also seems like something that Karate should support ootb (like basic use case that a test framework should cover).

And if you don't encode anything, how can we say it's even correct to append / between the items in the supplied list? When does it become acceptable to append / and not append /, when the string in the list already starts with a / we omit it?

I would like to see clear examples of how you think it should work, this would help me understand the use cases.

Would we be able to apply this change only when there is a List of paths (provided by comma separated list or when the keyword shows multiple times) and documenting that if you have special needs for encoding you should use that alternative?

Can you please show examples of use cases and how they would behave?

My suggestion is to leave the current behavior as is if param as one argument but encode if you pass a list. So:

Do not encode for this:
```
* path 'path/otherpath/final'
```
Encode for this:
```
* path 'path', 'otherpath', 'final'
```
If you are unlucky (or are putting most of the path in the URL) and there's only one word that should be encoded be explicit on passing a list
```
* path ['|scope']
```
To deal with different encodings I agree with adding a new keyword or adding a new key to configure ( I didn't went through all the links above and personally wasn't aware of different encoding methods for HTTP)
```
* configure urlEncoding = 'customEncoder'
```

So for your example (http://something.com/path//otherpath/finalpath) if I was distracted/made a copy paste mistake this probably won't affect anything in my test and my code reviewer should tell me to change this (or the server will complain because some are picky but you'll see the produced url in the logs and can remove one of the slashes):

* url 'http://something.com/path/'
* path '/otherpath/finalpath'

But if the expectation is to have that first slash in your URL (encoded) I have the following option:

* url 'http://something.com/path'
* path '/otherpath, 'finalpath'

I'd even go slightly further and consider controlling that first slash between url and param for the user (i.e. if url ends with / and path starts with / remove one).

Hopefully this summary helps explaining what I meant with being counter intuitive!

Overall I just think we should keep it simple but still provide mechanisms for the user to control the URL they need to be produced out of their tests - which I believe can be achieved by keeping the two options/behaviors for the path keyword.

Other option is just a new keyword but we need to remember to add to the VS Code Plugin for syntax highlighting.

aaronjwhiteside commented 3 years ago

This is obviously not 100% accurate for paths like http://something.com/books/id-of-book-with-?-specialcharacter. I'd say that in general this is mostly common in PATCH/PUT/DELETE operations and the parameter of the URL is an ID and it's uncommon to have IDs with special characters (typically are numeric or alphanumeric uuids).

I want to point out here that there is the correct way to encode things and the not correct way. Karate prior to 1.0.0 was "mostly" correct in that it encoded all special characters in the path with the exception of /. So from my point of view this is a regression... special characters (ignoring /) were encoded correctly.

I think if we want to be "correct", we should encode all the special characters and not make exceptions. And "uncommon" doesn't mean "mostly correct" should be passable.

Now for our use case, our REST style API has resources that accepts IDs as part of the path, this being quite common in REST apis. The GET use cases where we might use params instead is when we're constrainting the resources by some set of criteria.

PATCH /some/resources/<entity id here> vs GET /some/resources?type=blue&style=blah

And our entity ids are composed of several pieces concatenated together by a separator, for example UP:MKT:asdjhouda79ydas. This is quite common when dealing with microservices, to identify the owning service of an id, and also when using some nosql data stores, where primary keys share a global namespace (Couchbase in our case).

We also have some resources that are reachable via their natural business key, instead of some prefixed UUID as above, in the example I gave before this was root| the pipe being a hierarchy separator... but that's very much use case specific to us. GET /some/resources/root|

I point these out because I think it causes issues to make assumptions about how people use and design apis and thus how they utilize karate to test these apis. If karate does things "correctly" then it always supports all use-cases, within the limitations of http specs. (tangent into the strange, it's even possible to send binary data in paths and query param names and values)

I think most people look at a URL and can distinguish that the URL is the first part of the string (http://something.com) regardless of concepts of a servlet or path etc. and the "rest" of the URL is the path. So if the path is 'path/otherpath/finalpath' or 'books/id-of-book-with-?-specialcharacter' the most natural association would be:
path 'path/otherpath/final'
instead of
path 'path', '', 'otherpath', 'finalpath'

I'm mostly with you here..

I'm coming from the point of view that if we make path "correct" and always encodes. Then that's what you would need to do to use path correctly, else any special characters including / will be encoded.

But as you pointed out it's not very ergonomic, the only thing that made it usable previously was the special case of skipping encoding /.

Maybe we can introduce something like?

* url 'http://something.com'
Given raw path '/path/otherpath/finalpath'

Which won't attempt to encode anything..

We could switch it around as was mentioned previously,

* url 'http://something.com'
Given encoded path 'path', 'otherpath', 'finalpath'

But that means the normal path step no longer encodes anything (ignoring /)? And then I wonder what ergonomics we have maintained.. by taking a list and concatenating it using /? I think a raw path should just be that without any special cases.. If we introduce an encoded path step, then I'm of the opinion that the normal path step should be dumb..

You can probably see at this point I'm trying to make things very black and white. Either it encodes everything or it encodes nothing. Again easier to reason about.. at least IMHO.

I actually think the string concatenation with + is bad idea

Generally agree with you, I think my point about concatenation was that

Given path 'hello', 'world', variable

is just as good as

Given path 'hello/world/' + variable

any maybe under some definitions, easier to reason about, ducks away ;)

after all you will only face the encoding issues if you are passing variables into the servlet.

Not sure I get the servlet reference, we shouldn't make assumptions about technology stack. Servlets have their own quirks, Tomcat has its own path encoding bugs, not shared by other engines like Jetty. And I'm sure there are many others for all the other languages and frameworks out there.

Never seen a servlet deployed in a path that has "non standard" characters.

The http spec allows for it, I have a use-case for it where | is accepted in the path and needs to be encoded. Apache HttpClient's URIBuilder and Spring's UriComponentsBuilder explicitly allow for it and deal with it. I'm not that well versed in other languages outside java's ecosystem but I'm sure their libraries/framework will support similar use cases too.

If we follow the spec and allow one to disambiguate between a path segment and a raw path, and then only encode path segments. Then we don't have to make any assumptions about anything and anyone with any use-case allowed by the http spec will just work..

That would require a function to be added that encoded URL and probably not the direction for Karate (i.e. adding more functions) but also seems like something that Karate should support ootb (like basic use case that a test framework should cover).

I'm not quite following you here, but I think my previous comment might apply here too.

If you are unlucky (or are putting most of the path in the URL) and there's only one word that should be encoded be explicit on passing a list

I could live with this, but it feels awfully special-casey.

One argument to path and nothing is encoded.
More than one argument and suddenly all of them are encoded.
With one argument it's simply concatenated with the url, so you need to handle the trailing/leading path separator yourself.
With more than one argument, this is handled for you.. and you don't need to think about it.

configure urlEncoding = 'customEncoder'

This could works,

perhaps pathEncoding instead of urlEncoding, as mentioned in previous comments encoding the URL as a whole isn't really viable, context is needed to understand intent.

* configure pathEncoding = 'simple'

to preserve 1.0.0 behaviour

and

* configure pathEncoding = `encoded` // or maybe `strict`

to get back 0.9.6 behaviour minus the skipping of /.

I'd even go slightly further and consider controlling that first slash between url and param for the user (i.e. if url ends with / and path starts with / remove one).

This is already handled by the URIBuilder when appending path segments, for the encoded case. At least in the PR changes right now.

I guess it's a conceptual thing too, I'm envisioning the path step dealing with a list of path segments, which are a unit of their own and are used to build/augment the path in the final url.

And if we're talking about the simple path encoding option, then I see them as less path segments, and more just strings that get concatenated together under simpler rules to produce the final url. So the less magic we do behind the scenes the better? Or maybe just issue a warning if we detect this usage, as it's probably not what the user intended and is a bug?

Hopefully this summary helps explaining what I meant with being counter intuitive!

Overall I just think we should keep it simple but still provide mechanisms for the user to control the URL they need to be produced out of their tests - which I believe can be achieved by keeping the two options/behaviors for the path keyword.

Other option is just a new keyword but we need to remember to add to the VS Code Plugin for syntax highlighting.

I want to thank you for taking the time to put together a detailed response to my query and concerns, it's much appreciated.

I think at this point I'm finding the configure pathEncoding approach seems reasonable, with maybe two choices in value: simple, encoded or strict?

Good thing I don't use VS then ;)

joelpramos commented 3 years ago

For VS is not we "need" it's more we "should" :) Fancy colors go a long way in avoiding frustration.

I'm ok with the configure pathEncoding keyword. I think it achieves everything we all agreed. Wondering whether we keep three values none, simple, encoded, where none is 1.0.0, simple is 0.9.6 (enabled by default) and encoded is the proper encoding to avoid regression for 0.9.6 - what do you think?

ptrthomas commented 3 years ago

quick 2c I don't like the configure option that much, it is un-necessary bloat I'm proposing that we break users. but intro a raw path keyword as @aaronjwhiteside suggested so that the breaking change is to type 4 characters where needed for the old behavior. also trying to lean hard towards doing the right thing wdyt :|

joelpramos commented 3 years ago

I'm also good with that as it's a very simple/straightforward path for upgrade. Might need to bump the minor version and update the Upgrade guide to represent the breaking change?

ptrthomas commented 3 years ago

yes, I'm guessing this would be 1.1.0 and the release notes would mention

ptrthomas commented 3 years ago

so @aaronjwhiteside apologies for sitting on this. I'm chilling a bit after the mad rush of 1.X can you revise the PR so that we take the hit of breaking users and intro the new raw path "prefix" keyword I'll plan a release 1.1.0-RC1 so that we can release that to the wild for feedback

aaronjwhiteside commented 3 years ago

@ptrthomas acking your comment, might take me a few days to get around to it..

And just confirming that the new raw path step will perform the same prefix / trimming as the current codebase does, as opposed to truly being "raw" in that we accept whatever is passed in and append it to the url?

ptrthomas commented 3 years ago

@aaronjwhiteside great point.

my answer is yes, let it behave the same way as current code base
although it seems to defeat the purpose of raw
we'll see what the feedback is for an RC before we take a "final" decision
and we have the option to do the right thing for raw in a future release which yes - may mean a breaking change again
but we can request users now that when they do a pass over the code to insert raw where needed that they should ideally strip out any / and make it work with path as a best practice

ptrthomas commented 3 years ago

moving to project backlog: https://github.com/intuit/karate/projects/3#card-61812803

aaronjwhiteside commented 3 years ago

@ptrthomas haven't abandoned this, just busy with other things at the moment!

aaronjwhiteside commented 3 years ago

@ptrthomas I've got a new PR up, can we reopen this issue?

ptrthomas commented 3 years ago

@aaronjwhiteside reopening with pleasure. thanks for the detailed notes in the PR, I will try to re-wire some of the code conventions in my head and this is great feedback.

this change is why I'm calling the next release 1.1.0 instead of 1.0.2

aaronjwhiteside commented 3 years ago

@ptrthomas I just fixed a bug and pushed another commit

gerben86 commented 3 years ago

I understand the change you've done, but I'm running into a related issue: This is my (simplified) karate-config.js:

function fn() {
  var config = { // base config JSON
      completePath: 'questions/47445984/using-karate-config-parameters-in-a-feature-file'
    };
    return config;
}

And my simplified scenario:

Scenario: brief description of this scenario
    Given url 'https://stackoverflow.com'
    And path completePath
    When method Get
    Then status 200

This results in this url: https://stackoverflow.com/questions%2F47445984%2Fusing-karate-config-parameters-in-a-feature-file

We have this setup because we reuse the paths for different scenario's/feature-files. So we add them to the config instead of adding them to each feature-file separately. How can we overcome this, without using the raw path option?

aaronjwhiteside commented 3 years ago

Hi @gerben86,

The raw path step was added exactly for paths that already contain the / path separators, and other characters you don't want encoded. It's a perfect match for your use case.

But if you wanted to keep using the path step and not move to raw path, you should be able to define completePath like this:

completePath: ['questions', '47445984', 'using-karate-config-parameters-in-a-feature-file']

P.S. Actually this might not work, as completePath is wrapped in an array before being parsed as an expression. @ptrthomas Do you think it's worth while expanding the step to check if the supplied argument is a variable of type array?

ptrthomas commented 3 years ago

@gerben86 great catch ! @aaronjwhiteside just made the change that should now work for arrays or strings / with commas. let me know what you think

ptrthomas commented 3 years ago

@gerben86 I also think this would have worked for you: Given url 'https://stackoverflow.com' + '/' + completePath

aaronjwhiteside commented 3 years ago

@ptrthomas looks good to me, should we apply the change to the rawPath() as well? And review the other steps to see if they might benefit from it? Also perhaps refactor into a helper method for easy reuse.

ptrthomas commented 3 years ago

@aaronjwhiteside I think the other steps are fine, this is the only one that just takes an array. the rest are key-values (param/s, header/s etc which already support lists as values)

I'm in 2 minds. I propose that users like @gerben86 who have anyway done non-trivial JS have to manually fix tests anyway, might as well use path or url

aaronjwhiteside commented 3 years ago

I propose that users like @gerben86 who have anyway done non-trivial JS have to manually fix tests anyway, might as well use path or url

This is the way I lean too, better to keep moving forward without workarounds/hacks..

ptrthomas commented 3 years ago

@aaronjwhiteside thanks. I've already gotten a handful of comments from users who are not very comfortable with this change. but as of now, I'm clear that we should do this. Karate is an API testing framework and the rules of HTTP come first.

also for the record, I always intended for path to work this way - without users having to "worry" about slashes. it devolved a bit, even in some of my own examples. so IMO this has to be done right going forward

raw path and url are reasonable workarounds. I would have liked to introduce a new non-breaking keyword, but path is so short and sweet we can't hijack it

krstcs commented 3 years ago

Recommend looking at another option: We leave path as it is, but if you want to escape a literal character, put a backslash ('\') in front of it. Path would take ALL strings passed in (one or more through comma-separated list of strings or lists-of-strings) and do the following:

split each string in all lists and sub-lists on forward-slash ('/')
encode all non-escaped special characters in each string in the full list
recombine full list using forward-slash ('/') or pass full list into UriBuilder or whichever one is being used

This allows the user to pass in any number of strings as a list, and should include lists of lists as you mentioned earlier. It allows us to escape specials we want to be used literally. It doesn't break most existing tests. I can think of only some weird, odd-ball situations that someone might do, but as @aaronjwhiteside said there are so many ways people use and abuse URIs/URLs/servelets/etc. that we can't account for them all, but I think this method will be the most flexible for the most people. This would also require users to go in and escape all path characters in their tests that are to be literal, that could be an issue for existing tests as well, but should be a one-time change, although that can be a lot of work.

workwithprashant commented 3 years ago

I agree with @krstcs as the approach will NOT break existing consumers especially after teams have already spent 1-2 sprints on 1.0 migration of their projects.

krstcs commented 3 years ago

I'm also thinking, we might want to do the same thing with url for consistency? The backend mechanic would be the same that way. url would just have the task of setting the url itself while path would just add paths to the set url. But I don't know if that makes sense? Maybe everything after the first single-forward-slash gets the same treatment? Edit: i.e.: \<literal>protocol://host:port/\</literal>\<encode>the rest\</encode>.

gerben86 commented 3 years ago

@gerben86 I also think this would have worked for you: Given url 'https://stackoverflow.com' + '/' + completePath

I think this will work, but I prefer the path because it's just a path :)

Anyway, I understand that we need to update something, but I'll wait for the final future-proof solution. Good to see that there's a proper discussion about this point.

ptrthomas commented 3 years ago

yes, the whole point of the RC is to get this feedback and I'm thankful for the conversation here.

@aaronjwhiteside I like the proposal to keep path as-is and use an escape character \ if a slash is REALLY needed (EDIT to be encoded), I never thought of that option to be honest and thanks @krstcs

let me know if you are ok or I can do this change over the weekend

aaronjwhiteside commented 3 years ago

split each string in all lists and sub-lists on forward-slash ('/')

@krstcs sub-lists if I understand you correctly, are not actually supported, and are probably better suited to another issue? I think they could be an interesting feature proposition on their own.

['hello', ['world', '123']] being expanded to /hello/world/123

@aaronjwhiteside I like the proposal to keep path as-is and use an escape character \ if a slash is REALLY needed (EDIT to be encoded), I never thought of that option to be honest and thanks @krstcs

@ptrthomas I don't think the backslash escape char removes the need for the two path styles currently in the RC.

Consider the use case where I have some dynamic data, say I have called an api and got a response and I'm using some value from the response to make another api call..

Background:
  * def otherFeature = call read('.....') { xxx: yyy }
  * def someId = otherFeature.response.id

Scenario:
  Given url baseUrl
  And path 'service', 'entities', someId
  When method GET

Now consider that someId contains characters I want encoded (/, |, =, etc, remember we cannot make assumptions here)

If path is no longer performs encoding of all characters (/) by default, how do I manually do this? Karate would have to expose some method to escape a string before being able to use it with path. Or I'd have to write one myself, and then the utility of karate is diminished.

Since the escape character \, essentially has to be defined manually/statically/explicitly, why not just define the correct path segments manually/statically/explicitly.

  Given path '/hello/world/this_path_contains_\/_a_slash'

vs

  Given path 'hello', 'world', 'this_path_contains_/_a_slash'
  # or
  Given raw path '/hello/world/this_path_contains_%2F_a_slash'

And at least with the raw path support, existing libraries can be used to URL encode/escape strings..

But maybe the escape character has a use within the raw path step, by allowing an encoding shortcut. For example someone wanting to URL encode / without having to actually lookup what the url encoded value is.. But this still suffers from issue when passing in dynamic data. And of course they could just use the "normal" path step that does the encoding automatically for them..

In summary: path should not make assumptions or even exceptions (\) to things it encodes, it must encode ALL THE THINGS equally.

ptrthomas commented 3 years ago

@aaronjwhiteside I hear all your arguments - but FWIW you were the first one ever to find issues with path encoding - so there is an aspect of how relevant this change is to the larger community

maybe we should just revert to 0.9.6 behavior and think about it a little more. I do think the edge case where users need to re-use a path segment from a previous response will always be a full URL - going by REST / HATEOAS concepts.

aaronjwhiteside commented 3 years ago

maybe we should just revert to 0.9.6 behavior and think about it a little more.

@ptrthomas this is perfectly ok by me, I think that's all I wanted in the beginning :)

I think the raw path implementation might be pretty close to the old 0.9.6 behaviour. Would need more testing..

I do think the edge case where users need to re-use a path segment from a previous response will always be a full URL - going by REST / HATEOAS concepts.

This is more or less our entire use case for Karate..

We have a platform made up of over 60+ microservices, testing these usually entails making an api call to a top level microservice, gathering the id(s) from the response and querying downstream microservices to ensure state was mutated correctly and or entities were created as expected.

Our APIs are RESTful, but do not follow HATEOAS concepts.. which wouldn't help with cross microservice interaction anyway.

karatelabs / karate

Path encoding issue after upgrade from 0.9.6 to 1.0.1 #1561