Closed lafrech closed 4 years ago
I'm not convinced with the multi-location feature
Neither am I. I suggested the wrapper example in #267 in part to make the case that "the user can do this him/herself if it's really wanted".
Great. @sirosen Would you be willing to work on this?
I'm definitely willing to, but I'm not sure when I'll get to it. I can try to carve out time for it in the next 2 weeks.
I think this is the way to go, so long as there's a decent story for users who need the legacy behavior. I've seen APIs that accept either form-encoded or JSON request payloads. We should document how to achieve this in the narrative docs and changelog.
Actually, we could accept multiple locations. What we need to break here is the "mixed locations" use case (data from single Schema spread amongst several locations).
We could implement "try whole Schema on first location, then second if first fails" (basically what's in https://github.com/marshmallow-code/webargs/issues/267#issuecomment-524952068). That would be documentable. And that would be compatible with the "pass all location data to the Schema at once" direction we want to take.
I'm just wondering about the error message content.
If first location fails because a single field is required and missing, the second location is tried. It is empty. We should return the error message of the first attempt, I guess.
I've seen APIs that accept either form-encoded or JSON request payloads.
But it wouldn't accept half of the payload in form-encoded and the other half in JSON, right?
What error message would you expect when sending wrong data in form-encoded and no data in JSON? In other words, how does the parser know, from the two locations (one of them can be empty), which to use to return the error message.
If one of the two is empty, it can be an easy case. But there could be cases where it is not, because it contains data related to another argument/Schema.
What we need to break here is the "mixed locations" use case (data from single Schema spread amongst several locations).
But it wouldn't accept half of the payload in form-encoded and the other half in JSON, right?
I don't think it's even possible to send both JSON and form-encoded payloads in a single request (?), so we need not support the 'single schema, mixed locations' case.
We could implement "try whole Schema on first location, then second if first fails" ... I'm just wondering about the error message content.
Yeah, the error handling would be left up to the user; I imagine the desired behavior will vary. We could exemplify how to merge error messages across locations in the docs.
I don't think it's even possible to send both JSON and form-encoded payloads in a single request (?)
Hmmm... I guess that would be multipart. OK, there could be more plausible examples of mixed-locations, like query args and path perhaps, but none of those I could think of were convincingly realistic.
Let's drop single schema - mixed locations. Definitely a corner case.
so we need not support the 'single schema, multiple locations' case.
Single schema - multiple locations is not such a corner case. It is the case you present in https://github.com/marshmallow-code/webargs/issues/267#issuecomment-524952068:
APIs that accept either form-encoded or JSON request payloads
We do not have to support this but it would suck if the refactor prevented the user from supporting this. And it's nice if we can document how to do it.
Fine for me if we leave it to user code, with an example in the docs.
the error handling would be left up to the user; I imagine the desired behavior will vary. We could exemplify how to merge error messages across locations in the docs.
As long as there's not mixed locations, there's no need to merge error messages. Just pick the one that corresponds to the location the faulty data was sent. It's not trivial to tell at first sight which one it is because both locations return an error.
Worse, in the (non-mixed) multi-location case, using the workaround in https://github.com/marshmallow-code/webargs/issues/267#issuecomment-524952068, if the data is sent in one location with a validation error, the validation fails, the second location is tried. It is empty. If no field is required, it doesn't error at all. But input data is silently ignored.
Perhaps the user code should probe first which of the two locations actually has content and call webargs only once with this.
Single schema - multiple locations is not such a corner case
Woops, I meant 'single schema, mixed locations'--fixed my comment.
Won't we need to merge error messages for the stacked case?
@use_args({"related_id": fields.Str(required=True))}, location="query"})
@use_args({"name": fields.Str(required=True))}, location="json")
{"related_id": ['Missing...'], "name": ["Missing..."]}
Oh, this.
I didn't think of it. Well, if we don't do anything, the request will abort in the external wrapper, so it will return only query args validation errors first. And when those are fixed, it'll return the errors from the json part.
I don't see an easy way to provide all the errors at once. But it's not a big deal if we don't IMHO.
It's like sometimes input data passes webargs validation and then triggers an error in the view func (e.g. unique index issue in DB), so a new error message appears.
BTW I gave it a quick look and this is gonna be a hell of a rework. Close to a total rewrite. Hopefully, not too many tests will be broken, so it won't be like starting from scratch.
Yeah, I also think it's fine if we don't merge errors; it's not necessarily more intuitive from the user's POV. Anyway, it can be done in userland.
I'm happy to work on this, but I'm not using webargs at work currently, so I'm concerned I might overlook corner cases. I think we should put this in a pre-release so it can get some early testing.
@sirosen Thanks for offering to tackle this. If you put up an initial proof-of-concept/WIP PR, I can work with you on the rest.
I've finally got a few hours today to try to get something real going. This branch in my fork is where I'll be hacking away if anyone wants to take a peek, but this is just what I could scratch together during the work-week and it's quite broken right now (almost all tests fail).
Initial thoughts and questions based on starting on an implementation:
parse_{location}
-> location_load_{location}
, e.g. location_load_json
. I'm going with this somewhat verbose option in the hopes that it's clear that this is just a hook for reading data but doesn't do any "parsing"Parser.parse_arg
is going away because it's not clear what it should doDEFAULT_LOCATION = "json"
and a single location
per call is a big API improvement, IMO -- I've been pursuing that.
"json"
is arguable as a choice. But flask-rest-api
defaults to json
and I came across flask-restful
as well, which uses ('json', 'values')
.get_value
is being replaced with a schema-aware proxy for result dicts -- otherwise, MultiDicts won't work rightI've been thinking about a couple of sticky cases in this thread and I have two ideas I'd like to put forth.
I've seen APIs that accept either form-encoded or JSON request payloads.
This seems very legitimate and should be supported out-of-the-box if possible. Is this the only "normal" case we're aware of in which a single schema is applied to multiple locations?
If so, I'd suggest adding location="json_or_form"
with the following meaning:
missing
when there's no content, not {}
-- I noticed that not all do)missing
if neither is populated, json data if both are, form post only if json is missing and form body is present -- to the schemaThe result would be write-able in terms of a location_load_json_or_form
method on the base parser, I think.
I like this because it doesn't (re)introduce locations=...
or a rename like try_locations=...
just to handle a special case that we want to support.
If users have more exotic APIs -- e.g. "takes query parameters or headers for X" -- they can implement it following this same model on a custom Parser
subclass.
They can even implement strange (read: unwise) things like merging locations with this kind of "meta-location" pattern.
Won't we need to merge error messages for the stacked case?
@use_args({"related_id": fields.Str(required=True))}, location="query"}) @use_args({"name": fields.Str(required=True))}, location="json")
{"related_id": ['Missing...'], "name": ["Missing..."]}
In exchange for putting a slight burden on users in this case, I think we can support this really easily with this kind of usage:
@webargs.collect(
parser.use_args({"related_id": fields.Str(required=True))}, location="query"}),
parser.use_args({"name": fields.Str(required=True))}, location="json"),
)
def viewfunc(name, related_id):
...
which might raise a
WebargsCollectedError([
{"related_id": ['Missing...']},
{"name": ["Missing..."]}
])
The result will be that use_args
stays very simple and easy to maintain.
If we want use_args
to do this internally, I think it's still a good idea to keep the errors separate, wrapped in a list, and not try to merge their contents together. The user can decide what to do with that information.
The best way I can think to support this "magically" (i.e. without the explicit collect
wrapper) is the approach used in pallets/click
in which you create a special attribute to store data. You then need to replace the function with a singular callable (click uses objects, but maybe just a function is fine) that knows how to handle that attribute -- so it's slightly tricker than the typical nested decorators case.
See discussion in https://github.com/marshmallow-code/webargs/issues/267.
The idea is that the Schema would be passed the whole data from a location, so there couldn't be fields with different location in the same Schema.
location
argument toField
location
inuse_args
This is an important breaking change.
It breaks the default use case, which is to iterate on all fields, and for each field search in
DEFAULT_LOCATIONS
(by default: query string, form, then json).We could keep a somewhat similar behaviour by trying to load the whole data from query string, then form, then json. The only added restriction is that Schema couldn't contain fields from mixed locations.
I'm not convinced with the multi-location feature, so we could even
DEFAULT_LOCATIONS
withDEFAULT_LOCATION
The user could subclass the parser to change the default location (in our framework, we use json by default), and he would need to specify the location explicitly when it is not the default.
I think all this changes allow for better documentation (e.g. in apispec).