Closed lafrech closed 7 years ago
@lafrech Thank you for this interesting idea!
Here are my thoughts on this matter:
What are the options? I haven't thought on this long enough, but since I wouldn't expand parameters, there are at least two ways to go:
transform suffixes in parameters into masks and operators in values, e.g.:
?name=chuck&surname=*orris*&age=<<69
=<<
is ambiguous less-not-equal
operator, yet it is a solution.
transform suffixes into functions in values, e.g.:
?name=chuck&surname=contains(orris)&age=less_than(69)
Personally, I haven't yet faced a need to support arbitrary filtering and currently I just use dedicated parameters to achieve similar behavior:
?name=chuck&surname=orris&min_age=69
My backend implementation just treats name
and surname
as exact matching first and as you would with __contains
next, and min_age
+ max_age
can cover all your suffixed cases except __ne
, but __ne
is not relevant everywhere either.
Does this make sense?
Thanks @frol for the feedback. I'll be investigating this next week.
Just a quick note to add that on second thought, I should probably use a custom parser.
Good discussion here. I'm going to close this for now, though, since API filtering is outside the scope of webargs.
Alright. For the record, for our application, we ended up using custom "search" URL with a query field in which the client is supposed to enter a query (in python form, PQL translates python to MongoDB). This is not the most satisfying solution. It is totally linked to the database behind the API, or at least the programming language. But it does the job for now and we lack time to aim for ideal.
@sloria I'm coming across a similar design conundrum in my app. If API filtering is outside the scope of webargs (and I don't understand why given that the filter arguments are... well, web arguments), what scope is it in? App is flask(-classful) / webargs / marshmallow / sqlalchemy.
Perhaps in webargs through a custom parser but not webargs core.
Yes, to clarify my above statement: webargs may be used to parse API filter arguments, but it is agnostic re: filtering syntax (age__lt
, age[lt]
, etc). As Jérôme pointed out, a custom parser should be used to handle your app's specific syntax.
Thanks for chipping in - it's useful. I'll continue here unless you think another place is best for the discussion. I've been reading the marshmallow docs on custom fields and extending schemas, and I still have questions.
Let's assume I have a pizza delivery endpoint /deliveries/
with query string parameters status / start_time, and I want to add "special" parameters such as status__in
and start_time__lt
so I can do things like GET /deliveries/?status__in=new,in_oven
.
I'm passing a ModelSchema
to use_kwargs
. In the past (before these complex query parameters became necessary), I had view code like:
deliveries_schema = DeliverySchema(many=True)
# I use .fields because ModelSchema doesn't deserialize to a dict
@use_kwargs(deliveries_schema.fields)
def index(self, **kwargs):
deliveries = Delivery.query.filter_by(**kwargs)
return deliveries_schema.jsonify(deliveries)
This worked, but it won't work any more. Not surprising, as it's a brittle design that naively assumes that I want to pass all query string parameters to filter_by
as is. I need a way to pass the "normal" arguments to filter_by, and then parse the "special" arguments and send them to sqlalchemy.
I don't think a custom parser is the whole solution, it seems I also need to make my schema and view aware of these "special" parameters? Query string parameters e.g. status
that were known in the ModelSchema
are handled, but special parameters e.g. status__in
are ignored. For example, GET /deliveries/?status__in=new,in_oven
doesn't work.
The custom parser docs say "To add your own parser, extend Parser and implement the parse_* method(s) you need to override", but I don't see how that helps if parse_querystring
ignores status__in
completely.
Also, I don't wish to explicitly add each special parameter, as it would quickly get unmaintainable. I'm looking for magic that recognizes the __whatever
suffixes and handles them accordingly.
I used https://github.com/marshmallow-code/marshmallow/issues/29 as inspiration, but option 1 (additional fields) isn't an option because I don't know what special query parameters I'll receive at compile. I tried using option 2 (custom data handler) which seems to have been replaced by post_load
.
But in the above case, post_load
doesn't run, yet post_dump
does. Even weirder, if I replace the above index
function with one that uses use_args
such as:
@use_args(DeliverySchema())
def index(self, args):
deliveries = Delivery.query.filter_by(args) # This won't work anyway, but worry about that later
return deliveries_schema.jsonify(deliveries)
both post_load
and post_dump
run.
I don't understand why this happens, but in the top example, if I replace @use_kwargs(deliveries_schema.fields)
with @use_kwargs(DeliverySchema())
, then post_load
does run (unfortunately followed by a crash and burn because ModelSchema
doesn't deserialize into a dict so I can't pass it to use_kwargs
).
I guess general questions are in order at this point. What should the architecture look like to make *__in
-style query parameters "work" and pass other parameters to sqlalchemy for filtering? Does any of the above make sense?
I'm in a bit of a rabbit hole and would appreciate any support to get out. If I've gone completely round the houses, that's fine, I probably learnt something on the way, but I'm pretty stuck. :)
Hi.
unless you think another place is best for the discussion
No pb. Here is fine.
Unless I'm missing something, what you're seeing is normal. post_load
is called after the Schema loads the inputs data in webargs, and post_dump
after the Shema dumps the return data in the view (I suppose jsonify
is a method you added to do dump + jsonify ?).
When doing @use_kwargs(deliveries_schema.fields)
, you don't use the Schema, only its fields, so it's no surprise post_load
is not called.
This worked, but it won't work any more. Not surprising, as it's a brittle design that naively assumes that I want to pass all query string parameters to filter_by as is.
Yes.
When doing REST, in the general case, I use the model Schema (DeliverySchema
) to load POST/PUT request payload and dump GET/POST/PUT response payload (location = json/body). And I use a specific handmade schema (DeliveryQueryArgsSchema
) to parse GET arguments (location = querystring).
In this handmade Schema, I add manually the fields for the parameters I want to use. It smells like duplication, as if I want to allow filtering by any field, I need to add all the fields manually, but in practice I came to realize that I never wanted that. For instance, sorting by string fields such as name, address or so was not really realistic. IDs are meant to be used as filters. Names could be used by complex pattern searching features for partial string matching, but as plain filters, they kinda suck as the user can't be expected to enter the real name with no typo, the proper case and accents, etc.), otherwise there'd be no need for an ID (this is totally peremptory and arguable, MongoDB typically uses plain strings for ID in their tutorials, but you get the point). Floats are a better example. No point filtering by float since float equality is almost impossible technically and makes no sense in real life. You'd rather use a gt/lt interval
In a project, we also crafted a sort field that deserializes inputs such as name,-age
into a query sorting by name and reversed age.
There could be ways to avoid doing everything manually. You could create a function that would generate those DeliveryQueryArgsSchema
the clever way, given DeliverySchema
.
It could use some fields as filters, maybe only some types of fields, or maybe only those to which you would have passed use_as_filter=True
in the Schema declaration.
It could add _lt and _gt and such to all number and datetime fields. Note there may be a lot of them, with lte, gte, etc.
And so on.
Also, note @frol's advice above:
While suffixes are fine from URL point of view, they would look bloated in Swagger config since you will have every parameter expanded into 6 or 8 parameters (depends on the parameter type: number or string). This is why I probably won't implement querying this way.
If you need complex querying (I mean non-trivial, I totally agree that what you're talking about it not complex API and sounds rather like everyone's needs), maybe you're better-off using a String field in webargs, document it as "my custom query string" in the docs (assuming you're documenting your API with a tool like marshmallow's apispec) and do the query parsing in a custom function.
Beware of the quest for the silver bullet. There may be no point having a frameworks that creates an API allowing the user to do everything on the data. Sometimes, there are things you don't want him to do, so you need to be able to add exceptions, like removing part of the automatically generated fields. And some other things are harmless but useless. Maybe you only need 100% of it.
YMMV. In our case, we realized that we didn't need that many options in the query, so we removed the PQL code (see above) from our application and just added the fields manually for the filters we wanted to provide the user. Overall, there were not that many fields to add.
Hope this helps...
Thanks for that. I agree that I don't need to be able to perform every field / query permutation. If the "suffix magic" I'm looking for is non-trivial to implement, then it is the right thing to "satisfice". If the *Schema
and *ArgsSchema
split you speak of is the de-facto solution for this in webargs, it's good enough for me. Key insights:
post_load
was not a useful approach because it uses (updates) the same schema.Anyway, I added a separate Schema
for GETs and a custom query class on the sqlalchemy side that parses the key for a __
, and if it finds one, splits the key into the real key and the query operator. Seems to work so far for my basic __in
use case, with a bit of hacking to build the filters for the query.
I like it because I don't think it's overengineered but it feels like it will be extendable. So thanks a lot to all in this thread for the inspiration!
I felt the idea of the custom query field would involve extra amounts of parsing work, maybe I'm right, maybe I'm wrong, but at the moment things work so I'm happy. :)
Aside: Is either of use_args
and use_kwargs
generally preferred? I understand what each does, but not always in which use cases I should prefer one over the other. I can see that with use_kwargs
I don't have to bother iterating through arguments to know what is present, and I can write things like if kwargs["status__in"]:
. So is there ever a reason to prefer use_args
?
Aside 2: deliveries_schema.jsonify
uses the flask_marshmallow jsonify method.
Our typical PUT (I stripped off unrelated stuff and exception handling).
The arguments
decorator uses use_args
internally. This allows us to separate item_id
, injected by flask from the route, and new_item
, injected by use_args
.
@blp.route('/<objectid:item_id>')
class BuildingsById(MethodView):
[...]
@blp.arguments(BuildingSchema)
@blp.response(BuildingSchema)
def put(self, new_item, item_id):
"""Update an existing building"""
item = Building.get_by_id(item_id)
item.update(new_item, BuildingSchema)
item.save()
return item
@lafrech Sure, but why use_args
over use_kwargs
? :)
Well, I like to have new_item in a single variable. If I used use_kwargs
, the view func would look like this, I suppose:
@blp.route('/<objectid:item_id>')
class BuildingsById(MethodView):
[...]
@blp.arguments(BuildingSchema)
@blp.response(BuildingSchema)
def put(self, item_id, **new_item):
"""Update an existing building"""
item = Building.get_by_id(item_id)
item.update(**new_item, BuildingSchema)
item.save()
return item
And what if the object has an attribute named item_id
? Alright, this shouldn't happen.
The object attributes would be mixed with the ID in the function signature, which doesn't convey the intent.
Well that's my understanding. I have no strong opinion about that. And maybe I'm just wrong about use_kwargs
.
I would like my API to offer basic query features, like not only get item by ID but also using operators on item attributes:
Proof of concept
Here is what I've done so far. I did not modify webargs code. I'm only subclassing in my own application.
I've been searching around and didn't find a universally accepted norm specifying such a query language. In my implementation, I'm using the double underscore syntax and a subset of operators from MongoEngine.
Basically, I have two families of operators, some operating on numbers, others on string.
Those lists could probably be extended. Operators meanings should be easy to grap. Examples:
age__lt=69
means I expect the API to return records with attribute age lower than 69surname__contains=orris
means attribute age contains string "orris"To let webargs parse such parameters, I need to modify the
Schema
so that for chosen fields, the Marshmallow field is duplicated (deepcopied) into needed variants. For instance, the fieldage
is duplicated intoage__lt
,age__gt
,...This is an opt-in feature. For each field I want to expose this way, I need to specify in
Meta
which operators category it should use (currently only two categories:number
andstring
). Auto-detection is complicated as I don't know how I would handle custom fields, and there may be other categories some day.In the Schema, I add:
For each field, I'm passing a list of categories as one could imagine several categories applying to a field, but currently, I have no example of field that would use both number and string operators.
And the "magic" takes place here:
And finally, I use the Schema to parse the query arguments:
Questions
This raises a few points.
QueryArgsSchema
also has sort and pagination fields, but I didn't expose them here as this needs nothing fancy on Marshmallow's side. Maybe a real "query parameters" feature would integrate these as well.Feedback greatly appreciated. It seems to work right now, but on the long run, I might discover it was poorly designed from the start.
Thanks.