Open kosarko opened 5 years ago
the implicit ordering of representations dictionary (the first item is returned), which might change between python versions
AFAIK dictionary ordering is guaranteed to be insertion order for recent versions of Python 3, quoting from the docs:
Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6.
I think there's 3.5.2 running now and it was something older before that. Maybe there's a way to bind the wildcard to one of the types 🤔
I think there's 3.5.2
OK, so dict ordering can't be relied upon in the current environment :) But after having a look at the codebase, I'm not sure that dict ordering is actually the problem. My look was admittedly very cursory, so I may be wrong, but it seems to me like the culprit could be this _request_wants_json
helper, specifically the final greater-than comparison:
I think it should be changed to the following (it can possibly even be inlined at this point):
def _request_wants_json():
best = request.accept_mimetypes \
.best_match(['application/json', 'text/html'])
return best == 'application/json'
Here's why: when Accept
is */*
, request.accept_mimetypes.best_match(list)
returns simply list[0]
, which is deterministically application/json
in this case. best_match
also takes into account any q-factor weighting, so that with a header like Accept: application/json;q=0.8, text/html;q=0.9
, it will correctly return text/html
. So that already makes the request.accept_mimetypes[best] > request.accept_mimetypes['text/html']
check on lines 48--49 unnecessary.
However, that check unfortunately also seems to be actively harmful: when Accept: */*
is sent (or Accept: application/json, text/html
with no q-factor weighting), both request.accept_mimetypes["application/json"]
and request.accept_mimetypes["text/html"]
will default to 1. Which means the greater-than check and consequently the entire _request_wants_json
helper will return False
, even though in these cases, the request actually does want JSON.
Also note that text/html
should possibly be text/plain
(or text/*
, if more leniency is desirable), or that text/plain
should at least be added to the list? (EDIT: Otherwise, requesting Accept: application/json;q=0.8, text/plain;q=0.9
will yield JSON, which is not the preference the client expressed. Granted, it's a highly contrived case, such an Accept
header is unlikely in practice, but still.)
What's mildly puzzling -- that bit of code seems to not have been touched in over a year, and there have only been two commits since the start of April, when I last experienced the "correct" behavior of being served JSON when requesting Accept: */*
.
So I understand that the dict ordering probably comes into play somewhere further down the road in the request's lifecycle -- requests have been misidentified as not wanting JSON for a longer period of time, but this was previously being compensated by the fact that when it came to popping the mimetype of the response out of a dictionary, JSON came first.
Any feedback on this? Should I maybe come up with a PR, or is this something you'd rather fix in-house?
Hi @dlukes, sorry for the lack of responses.
We'll be doing some behind the scenes changes of translation, so I'll try to do the python version update as well.
As for _request_wants_json
there's only one place that calls it:
https://github.com/ufal/lindat-translation/blob/c0900f3553782135df629ad5d623745c3dae9ea6/app/main/views.py#L11-L16
That's the handler for /
(ie. /services/translation in our deployment) which either redirects the user to /services/translation/api/v2/ (if the user wants json) or returns an html page (with the forms that the user can use to translate). If there's Accept: */*
this should default to html.
Edit:
ie. _request_wants_json
currently plays no role in the response type of .../models or .../languages api calls
Thanks for getting back to me and explaining where _request_wants_json
fits in! OK understood, that makes sense. Sorry, I should've tried to get the app up and running and poke around in it before making conjectures, but setting it up felt like too much of a rabbit hole, since it's not really meant/documented as a project accessible to outsiders :)
Without the models (or at least a model) there's really not much you can play with. I know docker-compose/dockerfile is not a documentation...but it should give you an idea how to start things...I'm not testing it very often but it should work most of the time
The "frontend" (the web + python part that deals with user inputs): https://github.com/ufal/lindat-translation/blob/90239b82f91c1eaab8efb3a2b0282328202c8e55/Dockerfile#L1-L9
Or to run in devel mode:
PYTHONUNBUFFERED=1 python manage.py runserver
Thanks for the details! To be clear, my comment about the documentation or relative lack thereof wasn't meant as criticism, it was rather by way of apology for not having tried to experiment with the code :) This project is clearly meant to be consumed as a service, so it makes perfect sense that it's not worth the effort to maintain detailed and up-to-date setup instructions for outsiders.
Nevertheless, your reply did motivate me to play around with the app a bit (thanks in particular for the devel mode tip!). For anyone else who might want to play with just the frontend without worrying about models, comment out this line and hardcode the outputs to something sensible relative to the input you'll be sending. In particular, make sure to match the number of sentences, e.g. if you send PrÅ¡Ã. Asi bude zima.
, set outputs = ["It's raining.", "It might be cold."]
.
In summary, as you mentioned, there are two ways to make the content type of the response predictable: either rely on a predictable ordering of the representations
dictionary, or override the */*
accept header. I've sketched some options which all make application/json
the default content type (which is how the service behaved last year and up until this spring at least) -- feel free to use them as inspiration:
representations
dict
s in CPython 3.6 and any Python 3.7+: https://github.com/dlukes/lindat-translation/commit/568955cba9efb99fd562fe19715580bbe8f86504collections.OrderedDict
(this has the benefit of being more explicit and of not requiring (C)Python 3.6+, but since Python 3.5 has already reached end-of-life...): https://github.com/dlukes/lindat-translation/commit/f8514cf04bde5a25c8845e2f3c9343e6a541f0ba*/*
accept header
factory.py
: https://github.com/dlukes/lindat-translation/commit/9edd43a4727d9f12adf69e1b28b56d1704346242LanguageCollection.post
: https://github.com/dlukes/lindat-translation/commit/45e4d41d64f6d3550011e3a95fd2b3dffcc31069note that app/main/api/restplus.py sets default_media type to none
I did some digging and it looks like even if default_mediatype
was set to application/json
, the result would still depend on the ordering of the representations
(in addition to the app incorrectly responding to unhandled mimetypes like application/xml
). This is the code that determines the representation to return, and it doesn't make use of self.api.default_mediatype
, it just looks at the representations
:
Note that even with request.accept_mimetypes.best_match(representations, default=self.api.default_mediatype)
, the resulting mediatype
would still be list(representations)[0]
for an */*
accept header, because the default is returned only when there's no match, and */*
matches anything.
There even seems to be an issue open in the flask-restplus
repo stating that setting the default_mimetype
doesn't work as expected.
Hello! I came by here trying to find a solution. Specifically, on this:
Here's why: when
Accept
is*/*
,request.accept_mimetypes.best_match(list)
returns simplylist[0]
, which is deterministicallyapplication/json
in this case.best_match
also takes into account any q-factor weighting, so that with a header likeAccept: application/json;q=0.8, text/html;q=0.9
, it will correctly returntext/html
. So that already makes therequest.accept_mimetypes[best] > request.accept_mimetypes['text/html']
check on lines 48--49 unnecessary.However, that check unfortunately also seems to be actively harmful: when
Accept: */*
is sent (orAccept: application/json, text/html
with no q-factor weighting), bothrequest.accept_mimetypes["application/json"]
andrequest.accept_mimetypes["text/html"]
will default to 1. Which means the greater-than check and consequently the entire_request_wants_json
helper will returnFalse
, even though in these cases, the request actually does want JSON.
I think I've stumbled on a fix by inserting a fake mimetype into the first position in the list:
def _request_wants_json():
return request.accept_mimetypes.best_match(['red/herring', 'application/json']) == 'application/json'
We define two possible content-type for results of the translation
text/plain
andapplication/json
it the request comes with wildcardAccept: */*
the content-type used is nondeterministic, it depends on the implicit ordering ofrepresentations
dictionary (the first item is returned), which might change between python versions. note thatapp/main/api/restplus.py
sets default_media type to none. If this was set to something else it would generate response even for unsuported types (eg. application/xml) but that'd be incorrect.