Improve coercion of deeply nested queries

bajtos commented 8 years ago

For example, we expect strong-remoting to coerce gt value to a number and done to a boolean in the following request:

GET /todos?filter[where][priority][gt]=100&filter[where][done]=false

OTOH, if somebody had the unfortunate name: "Mr. False Undefined Null", then we would expect the name value in the following request to be kept as string:

GET /people?filter[where][firstName]=false

See the discussion started by https://github.com/strongloop/strong-remoting/pull/231#issuecomment-140551101 for more details.

bajtos commented 8 years ago

Cross posting from https://github.com/strongloop/strong-remoting/pull/231#issuecomment-140551101 and https://github.com/strongloop/strong-remoting/pull/231#issuecomment-140553839:

Right now we do this by inferring the type information. This makes the framework a bit magic, and gives less control to the consumer of the API as well producer. You do have the ability to specify explicit types, but that only works in the obvious cases. So in summary I welcome the changes to type inference. I am also in favor of just removing it entirely.

But lets not give up on the simplicity of the query syntax, because the flaw of the implementation. Maybe there is a nice syntax / notation / encoding out there that:

easily fits into URLs
supports explicit typing / doesn't require type inference
works well with the browser / curl / and other tools (eg. doesn't get converted into something un-readable by humans)

A couple that come close:

ejson or json - don't play nicely in URLs
our current approach - no explicit type support (like JSON)

Here is some food for thought that would extend our current syntax with JSON literals:

GET /todos?filter[where][priority][gt]=100&filter[where][done]=false

Would become...

GET /todos?filter[where][priority][gt]=`100`&filter[where][done]=`false`

This would parse to:

{
  where: {
    priority: { gt: 100 },
    done: false
  }
}

We could also limit these values to a subset of JSON to avoid JSON.parse() overhead. Some examples off the top of my head:

 - `100`
 - `0.1`
 - `true`
 - `false`
 - `Date(str)` - "str" would also match a pattern before being parsed
 - `null`
 - `undefined`

bajtos commented 8 years ago

In my opinion, it is not possible to accurately detect which type user wanted to encode in the query string.

Even if we were able to understand the loopback query/filter syntax and use that information to guide type inference (e.g. gt operator always expects a numeric value, done property is boolean and thus equality operator expects a boolean), it will be difficult to correctly support database-specific operators like MongoDB's $inc and friends.

I agree with proposal made by @ritch to break backwards compatibility and come up with a more reliable solution.

I like the idea of using a special character to distinguish between string values and "raw" data. I would simplify the solution proposed above and use only a single leading character to denote "raw" data. Also for date/time values, I think we can easily distinguish them from other values by requiring the date/time value to match the ISO 8601 format.

numbers
  gt=`100  
  gt=`0.1

booleans
  ?filter[where][check]=`true
  ?filter[where][check]=`fase

date
  ?filter[where][when]=`2016-05-09T07:49:37+00:00
  ?filter[where][when]=`2016-05-09T07:49:37Z

regexp
  ?filter[where][like]=`/foo.*bar/i

empty values
  ?filter[where][prop]=`null
  ?filter[where][prop]=`undefined

 a string starting with back-tick - only the first backtick needs encoding
  ?filter[where][name]=``name`name

ejson or json - don't play nicely in URLs

In my experience, this is actually pretty ok, and it makes building complex filters easier, especially when we get to and/or operators that expect array values. AFAIK, special characters used by JSON structure - {}[]", - are not necessary to encode, and spaces are optional, thus one can send a request like this:

GET /todos?filter={"where":{"priority":{"gt":100},"done":false}}

To make this play better with the tighter object coercion proposed here, I am proposing to detect JSON objects/array only for top-level arguments. This way we don't need any special encoding to distinguish JSON value from a string that looks like JSON, because we will call JSON.parse only when the top-level argument is expected to be an object or an array.

input arg (parameter) is of type String
  ?text={"foo":true}
  -> arg value is plaintext '{"foo":true}'
  ?text=foo
  -> arg value is plaintext 'foo'

input arg (parameter) is of type Object
  ?obj={"foo":true}
  -> arg value is object { foo: true }
  ?obj=foo
  -> fails with HTTP 400 Bad Request

A downside of json is that it cannot encode type information for Date, Buffer and RegExp. This is an issue of LoopBack models too, see https://github.com/strongloop/loopback/issues/1907 and https://github.com/strongloop/loopback/issues/1990. I would like to leave encoding of types in JSON out of scope of this issue.

@ritch @STRML @raymondfeng @superkhau thoughts?

ritch commented 8 years ago

I like the idea of using a special character to distinguish between string values and "raw" data. I would simplify the solution proposed above and use only a single leading character to denote "raw" data.

Sounds good.

With this in place I think we can remove any type inference completely. This would be a breaking change, but would allow us to make the coercion in strong-remoting predictable and explicit.

bajtos commented 8 years ago

Based on the results of the integration test suite I wrote in https://github.com/strongloop/strong-remoting/pull/304, I believe we don't coerce any nested values right now and always pass them as strings. I suspect the coercion of where filter is handled by juggler and/or connectors at the moment.

In the light of that, I am proposing to keep deep coercion out of scope of our current work in strong-remoting and focus on fixing "regular" coercion, which is the place where we have most known issues.

ritch commented 8 years ago

I suspect the coercion of where filter is handled by juggler and/or connectors at the moment.

In the operators implemented in juggler we do a lot of argument handling and likely convert to the expected type, which seems to work well actually.

In the light of that, I am proposing to keep deep coercion out of scope of our current work in strong-remoting and focus on fixing "regular" coercion, which is the place where we have most known issues.

The problem exists beyond filter. Its just the most obvious example. Perhaps we need to make coercion hookable so that you can handle edge cases instead of us trying to predict all cases? This would allow us to prefer sane / un-magical coercion for the basic types and delegate all complex coercion to user provided functions.

bajtos commented 6 years ago

With the release of LoopBack 4.0 GA (see the announcement), this module has entered Active LTS mode and no longer accepts new features (see our LTS policy).

strongloop / strong-remoting

Improve coercion of deeply nested queries #298