Unpredictable content type of response

kosarko commented 5 years ago

We define two possible content-type for results of the translation text/plain and application/json it the request comes with wildcard Accept: */* the content-type used is nondeterministic, it depends on the implicit ordering of representations dictionary (the first item is returned), which might change between python versions. note that app/main/api/restplus.py sets default_media type to none. If this was set to something else it would generate response even for unsuported types (eg. application/xml) but that'd be incorrect.

dlukes commented 4 years ago

the implicit ordering of representations dictionary (the first item is returned), which might change between python versions

AFAIK dictionary ordering is guaranteed to be insertion order for recent versions of Python 3, quoting from the docs:

Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6.

kosarko commented 4 years ago

I think there's 3.5.2 running now and it was something older before that. Maybe there's a way to bind the wildcard to one of the types 🤔

dlukes commented 4 years ago

I think there's 3.5.2

OK, so dict ordering can't be relied upon in the current environment :) But after having a look at the codebase, I'm not sure that dict ordering is actually the problem. My look was admittedly very cursory, so I may be wrong, but it seems to me like the culprit could be this _request_wants_json helper, specifically the final greater-than comparison:

https://github.com/ufal/lindat-translation/blob/fa0819724fa2cf0899471c53bc410e4ec886d837/app/main/views.py#L44-L49

I think it should be changed to the following (it can possibly even be inlined at this point):

def _request_wants_json():
    best = request.accept_mimetypes \
        .best_match(['application/json', 'text/html'])
    return best == 'application/json'

Here's why: when Accept is */*, request.accept_mimetypes.best_match(list) returns simply list[0], which is deterministically application/json in this case. best_match also takes into account any q-factor weighting, so that with a header like Accept: application/json;q=0.8, text/html;q=0.9, it will correctly return text/html. So that already makes the request.accept_mimetypes[best] > request.accept_mimetypes['text/html'] check on lines 48--49 unnecessary.

However, that check unfortunately also seems to be actively harmful: when Accept: */* is sent (or Accept: application/json, text/html with no q-factor weighting), both request.accept_mimetypes["application/json"] and request.accept_mimetypes["text/html"] will default to 1. Which means the greater-than check and consequently the entire _request_wants_json helper will return False, even though in these cases, the request actually does want JSON.

dlukes commented 4 years ago

Also note that text/html should possibly be text/plain (or text/*, if more leniency is desirable), or that text/plain should at least be added to the list? (EDIT: Otherwise, requesting Accept: application/json;q=0.8, text/plain;q=0.9 will yield JSON, which is not the preference the client expressed. Granted, it's a highly contrived case, such an Accept header is unlikely in practice, but still.)

What's mildly puzzling -- that bit of code seems to not have been touched in over a year, and there have only been two commits since the start of April, when I last experienced the "correct" behavior of being served JSON when requesting Accept: */*.

So I understand that the dict ordering probably comes into play somewhere further down the road in the request's lifecycle -- requests have been misidentified as not wanting JSON for a longer period of time, but this was previously being compensated by the fact that when it came to popping the mimetype of the response out of a dictionary, JSON came first.

dlukes commented 4 years ago

Any feedback on this? Should I maybe come up with a PR, or is this something you'd rather fix in-house?

kosarko commented 4 years ago

Hi @dlukes, sorry for the lack of responses.

We'll be doing some behind the scenes changes of translation, so I'll try to do the python version update as well.

As for _request_wants_json there's only one place that calls it: https://github.com/ufal/lindat-translation/blob/c0900f3553782135df629ad5d623745c3dae9ea6/app/main/views.py#L11-L16

That's the handler for / (ie. /services/translation in our deployment) which either redirects the user to /services/translation/api/v2/ (if the user wants json) or returns an html page (with the forms that the user can use to translate). If there's Accept: */* this should default to html.

Edit: ie. _request_wants_json currently plays no role in the response type of .../models or .../languages api calls

dlukes commented 4 years ago

Thanks for getting back to me and explaining where _request_wants_json fits in! OK understood, that makes sense. Sorry, I should've tried to get the app up and running and poke around in it before making conjectures, but setting it up felt like too much of a rabbit hole, since it's not really meant/documented as a project accessible to outsiders :)

kosarko commented 4 years ago

Without the models (or at least a model) there's really not much you can play with. I know docker-compose/dockerfile is not a documentation...but it should give you an idea how to start things...I'm not testing it very often but it should work most of the time

The "frontend" (the web + python part that deals with user inputs): https://github.com/ufal/lindat-translation/blob/90239b82f91c1eaab8efb3a2b0282328202c8e55/Dockerfile#L1-L9

Or to run in devel mode:

PYTHONUNBUFFERED=1 python manage.py runserver

dlukes commented 4 years ago

Thanks for the details! To be clear, my comment about the documentation or relative lack thereof wasn't meant as criticism, it was rather by way of apology for not having tried to experiment with the code :) This project is clearly meant to be consumed as a service, so it makes perfect sense that it's not worth the effort to maintain detailed and up-to-date setup instructions for outsiders.

Nevertheless, your reply did motivate me to play around with the app a bit (thanks in particular for the devel mode tip!). For anyone else who might want to play with just the frontend without worrying about models, comment out this line and hardcode the outputs to something sensible relative to the input you'll be sending. In particular, make sure to match the number of sentences, e.g. if you send Prší. Asi bude zima., set outputs = ["It's raining.", "It might be cold."].

In summary, as you mentioned, there are two ways to make the content type of the response predictable: either rely on a predictable ordering of the representations dictionary, or override the */* accept header. I've sketched some options which all make application/json the default content type (which is how the service behaved last year and up until this spring at least) -- feel free to use them as inspiration:

ordering of representations
- rely on insertion ordering of builtin dicts in CPython 3.6 and any Python 3.7+: https://github.com/dlukes/lindat-translation/commit/568955cba9efb99fd562fe19715580bbe8f86504
- rely on the same behavior of collections.OrderedDict (this has the benefit of being more explicit and of not requiring (C)Python 3.6+, but since Python 3.5 has already reached end-of-life...): https://github.com/dlukes/lindat-translation/commit/f8514cf04bde5a25c8845e2f3c9343e6a541f0ba
override the */* accept header
- either centrally in factory.py: https://github.com/dlukes/lindat-translation/commit/9edd43a4727d9f12adf69e1b28b56d1704346242
- or only in LanguageCollection.post: https://github.com/dlukes/lindat-translation/commit/45e4d41d64f6d3550011e3a95fd2b3dffcc31069

note that app/main/api/restplus.py sets default_media type to none

I did some digging and it looks like even if default_mediatype was set to application/json, the result would still depend on the ordering of the representations (in addition to the app incorrectly responding to unhandled mimetypes like application/xml). This is the code that determines the representation to return, and it doesn't make use of self.api.default_mediatype, it just looks at the representations:

https://github.com/noirbizarre/flask-restplus/blob/1fe65ddae4c04315bd88535e3c2eaff381436270/flask_restplus/resource.py#L49-L56

Note that even with request.accept_mimetypes.best_match(representations, default=self.api.default_mediatype), the resulting mediatype would still be list(representations)[0] for an */* accept header, because the default is returned only when there's no match, and */* matches anything.

There even seems to be an issue open in the flask-restplus repo stating that setting the default_mimetype doesn't work as expected.

jace commented 2 years ago

Hello! I came by here trying to find a solution. Specifically, on this:

Here's why: when Accept is */*, request.accept_mimetypes.best_match(list) returns simply list[0], which is deterministically application/json in this case. best_match also takes into account any q-factor weighting, so that with a header like Accept: application/json;q=0.8, text/html;q=0.9, it will correctly return text/html. So that already makes the request.accept_mimetypes[best] > request.accept_mimetypes['text/html'] check on lines 48--49 unnecessary.

However, that check unfortunately also seems to be actively harmful: when Accept: */* is sent (or Accept: application/json, text/html with no q-factor weighting), both request.accept_mimetypes["application/json"] and request.accept_mimetypes["text/html"] will default to 1. Which means the greater-than check and consequently the entire _request_wants_json helper will return False, even though in these cases, the request actually does want JSON.

I think I've stumbled on a fix by inserting a fake mimetype into the first position in the list:

def _request_wants_json():
    return request.accept_mimetypes.best_match(['red/herring', 'application/json']) == 'application/json'

ufal / lindat-translation

Unpredictable content type of response #17