RFC: Allow a single location for each field

lafrech commented 5 years ago

See discussion in https://github.com/marshmallow-code/webargs/issues/267.

The idea is that the Schema would be passed the whole data from a location, so there couldn't be fields with different location in the same Schema.

Drop location argument to Field
Only accept a single location in use_args

This is an important breaking change.

It breaks the default use case, which is to iterate on all fields, and for each field search in DEFAULT_LOCATIONS (by default: query string, form, then json).

We could keep a somewhat similar behaviour by trying to load the whole data from query string, then form, then json. The only added restriction is that Schema couldn't contain fields from mixed locations.

I'm not convinced with the multi-location feature, so we could even

Replace DEFAULT_LOCATIONS with DEFAULT_LOCATION

The user could subclass the parser to change the default location (in our framework, we use json by default), and he would need to specify the location explicitly when it is not the default.

I think all this changes allow for better documentation (e.g. in apispec).

sirosen commented 5 years ago

I'm not convinced with the multi-location feature

Neither am I. I suggested the wrapper example in #267 in part to make the case that "the user can do this him/herself if it's really wanted".

lafrech commented 5 years ago

Great. @sirosen Would you be willing to work on this?

sirosen commented 5 years ago

I'm definitely willing to, but I'm not sure when I'll get to it. I can try to carve out time for it in the next 2 weeks.

sloria commented 5 years ago

I think this is the way to go, so long as there's a decent story for users who need the legacy behavior. I've seen APIs that accept either form-encoded or JSON request payloads. We should document how to achieve this in the narrative docs and changelog.

lafrech commented 5 years ago

Actually, we could accept multiple locations. What we need to break here is the "mixed locations" use case (data from single Schema spread amongst several locations).

We could implement "try whole Schema on first location, then second if first fails" (basically what's in https://github.com/marshmallow-code/webargs/issues/267#issuecomment-524952068). That would be documentable. And that would be compatible with the "pass all location data to the Schema at once" direction we want to take.

I'm just wondering about the error message content.

If first location fails because a single field is required and missing, the second location is tried. It is empty. We should return the error message of the first attempt, I guess.

I've seen APIs that accept either form-encoded or JSON request payloads.

But it wouldn't accept half of the payload in form-encoded and the other half in JSON, right?

What error message would you expect when sending wrong data in form-encoded and no data in JSON? In other words, how does the parser know, from the two locations (one of them can be empty), which to use to return the error message.

If one of the two is empty, it can be an easy case. But there could be cases where it is not, because it contains data related to another argument/Schema.

sloria commented 5 years ago

What we need to break here is the "mixed locations" use case (data from single Schema spread amongst several locations).

But it wouldn't accept half of the payload in form-encoded and the other half in JSON, right?

I don't think it's even possible to send both JSON and form-encoded payloads in a single request (?), so we need not support the 'single schema, mixed locations' case.

We could implement "try whole Schema on first location, then second if first fails" ... I'm just wondering about the error message content.

Yeah, the error handling would be left up to the user; I imagine the desired behavior will vary. We could exemplify how to merge error messages across locations in the docs.

lafrech commented 5 years ago

I don't think it's even possible to send both JSON and form-encoded payloads in a single request (?)

Hmmm... I guess that would be multipart. OK, there could be more plausible examples of mixed-locations, like query args and path perhaps, but none of those I could think of were convincingly realistic.

Let's drop single schema - mixed locations. Definitely a corner case.

so we need not support the 'single schema, multiple locations' case.

Single schema - multiple locations is not such a corner case. It is the case you present in https://github.com/marshmallow-code/webargs/issues/267#issuecomment-524952068:

APIs that accept either form-encoded or JSON request payloads

We do not have to support this but it would suck if the refactor prevented the user from supporting this. And it's nice if we can document how to do it.

Fine for me if we leave it to user code, with an example in the docs.

the error handling would be left up to the user; I imagine the desired behavior will vary. We could exemplify how to merge error messages across locations in the docs.

As long as there's not mixed locations, there's no need to merge error messages. Just pick the one that corresponds to the location the faulty data was sent. It's not trivial to tell at first sight which one it is because both locations return an error.

Worse, in the (non-mixed) multi-location case, using the workaround in https://github.com/marshmallow-code/webargs/issues/267#issuecomment-524952068, if the data is sent in one location with a validation error, the validation fails, the second location is tried. It is empty. If no field is required, it doesn't error at all. But input data is silently ignored.

Perhaps the user code should probe first which of the two locations actually has content and call webargs only once with this.

sloria commented 5 years ago

Single schema - multiple locations is not such a corner case

Woops, I meant 'single schema, mixed locations'--fixed my comment.

Won't we need to merge error messages for the stacked case?

@use_args({"related_id": fields.Str(required=True))}, location="query"})
@use_args({"name": fields.Str(required=True))}, location="json")

{"related_id": ['Missing...'], "name": ["Missing..."]}

lafrech commented 5 years ago

Oh, this.

I didn't think of it. Well, if we don't do anything, the request will abort in the external wrapper, so it will return only query args validation errors first. And when those are fixed, it'll return the errors from the json part.

I don't see an easy way to provide all the errors at once. But it's not a big deal if we don't IMHO.

It's like sometimes input data passes webargs validation and then triggers an error in the view func (e.g. unique index issue in DB), so a new error message appears.

BTW I gave it a quick look and this is gonna be a hell of a rework. Close to a total rewrite. Hopefully, not too many tests will be broken, so it won't be like starting from scratch.

sloria commented 5 years ago

Yeah, I also think it's fine if we don't merge errors; it's not necessarily more intuitive from the user's POV. Anyway, it can be done in userland.

I'm happy to work on this, but I'm not using webargs at work currently, so I'm concerned I might overlook corner cases. I think we should put this in a pre-release so it can get some early testing.

@sirosen Thanks for offering to tackle this. If you put up an initial proof-of-concept/WIP PR, I can work with you on the rest.

sirosen commented 5 years ago

I've finally got a few hours today to try to get something real going. This branch in my fork is where I'll be hacking away if anyone wants to take a peek, but this is just what I could scratch together during the work-week and it's quite broken right now (almost all tests fail).

Initial thoughts and questions based on starting on an implementation:

I'm marking this as v6.0.0 in the changelog and noting differences from webargs v5. Does that sound right?
- Maybe webargs wants to do a final 5.x release before this merges, so that it can be the start of a 6.x branch with any other changes we might want
To try to keep things clear in meaning, I'm doing a rename parse_{location} -> location_load_{location}, e.g. location_load_json. I'm going with this somewhat verbose option in the hopes that it's clear that this is just a hook for reading data but doesn't do any "parsing"
Parser.parse_arg is going away because it's not clear what it should do
Going to a single DEFAULT_LOCATION = "json" and a single location per call is a big API improvement, IMO -- I've been pursuing that.
- "json" is arguable as a choice. But flask-rest-api defaults to json and I came across flask-restful as well, which uses ('json', 'values').
get_value is being replaced with a schema-aware proxy for result dicts -- otherwise, MultiDicts won't work right

I've been thinking about a couple of sticky cases in this thread and I have two ideas I'd like to put forth.

I've seen APIs that accept either form-encoded or JSON request payloads.

This seems very legitimate and should be supported out-of-the-box if possible. Is this the only "normal" case we're aware of in which a single schema is applied to multiple locations?

If so, I'd suggest adding location="json_or_form" with the following meaning:

try to load json, then form body, without passing the result to schema (aside: ensure that all json parse methods return missing when there's no content, not {} -- I noticed that not all do)
pass whatever the result of the above is -- missing if neither is populated, json data if both are, form post only if json is missing and form body is present -- to the schema
no retry, fallback, or other nesting of errors

The result would be write-able in terms of a location_load_json_or_form method on the base parser, I think.

I like this because it doesn't (re)introduce locations=... or a rename like try_locations=... just to handle a special case that we want to support.

If users have more exotic APIs -- e.g. "takes query parameters or headers for X" -- they can implement it following this same model on a custom Parser subclass. They can even implement strange (read: unwise) things like merging locations with this kind of "meta-location" pattern.

Won't we need to merge error messages for the stacked case?

@use_args({"related_id": fields.Str(required=True))}, location="query"})
@use_args({"name": fields.Str(required=True))}, location="json")

{"related_id": ['Missing...'], "name": ["Missing..."]}

In exchange for putting a slight burden on users in this case, I think we can support this really easily with this kind of usage:

@webargs.collect(
    parser.use_args({"related_id": fields.Str(required=True))}, location="query"}),
    parser.use_args({"name": fields.Str(required=True))}, location="json"),
)
def viewfunc(name, related_id):
    ...

which might raise a

WebargsCollectedError([
  {"related_id": ['Missing...']},
  {"name": ["Missing..."]}
])

The result will be that use_args stays very simple and easy to maintain.

If we want use_args to do this internally, I think it's still a good idea to keep the errors separate, wrapped in a list, and not try to merge their contents together. The user can decide what to do with that information.

The best way I can think to support this "magically" (i.e. without the explicit collect wrapper) is the approach used in pallets/click in which you create a special attribute to store data. You then need to replace the function with a singular callable (click uses objects, but maybe just a function is fine) that knows how to handle that attribute -- so it's slightly tricker than the typical nested decorators case.

marshmallow-code / webargs

RFC: Allow a single location for each field #419