hypothesis / h

Annotate with anyone, anywhere.
https://hypothes.is/
BSD 2-Clause "Simplified" License
2.95k stars 426 forks source link

We need to fix `request.authority` in `h` #8603

Open lyzadanger opened 6 years ago

lyzadanger commented 6 years ago

Pop quiz: What is the meaning of request.authority within the h app?

If you are like me, you'd intuit that it means "the authority for the current request" because it's attached to request in such a fashion.

Nope! Not at all! All it is is a static setting (or it falls back to the current domain if a setting isn't present for authority). So in h's production case, it is always hypothes.is (note: if you're running h locally, request.authority is localhost unless you override it).

There are two problems here:

  1. This value doesn't do what it seems like it would, leading to me creating at least one bug in the past;
  2. We do have a need to know what the authority for the current request is (at least in the API), which is currently unmet

request.authority is used in 21 modules within h. Excising it will take some care.

One possible plan could be:

1) Rename current request.authority to request.default_authority or some other naming that more correctly implies what it is and that it is static. This could be a find-and-replace task for the most part 2) Re-implement request.authority such that it reflects the request's authority, that is, the default authority or an authority applied to, say, an auth_client

robertknight commented 6 years ago

If I understand the proposal correctly, I'm in favor of making request.authority reflect the logged-in user or client ID associated with the request.

One caveat to be aware of when fixing up any existing code is to remember that in the context of a Celery worker task, there is never a "logged in user" or client ID so if any of that code cares about the authority, it would have to determine it via some other means.

lyzadanger commented 6 years ago

One caveat to be aware of when fixing up any existing code is to remember that in the context of a Celery worker task, there is never a "logged in user" or client ID so if any of that code cares about the authority, it would have to determine it via some other means.

I'm not getting any (grep) hits for authority in the h.tasks directory—is there anywhere else I should be looking for this?

robertknight commented 6 years ago

I'm not getting any (grep) hits for authority in the h.tasks directory—is there anywhere else I should be looking for this?

It could be an issue for any code called by h.tasks code. Mind you, this issue would affect eLife and anyone else already using third party accounts so I hope that should have flushed out any existing problems.

lyzadanger commented 6 years ago

It could be an issue for any code called by h.tasks code.

Ah, yes, I see how this could be so. I'll keep an eye out, but yeah, anything that is celery-ized should not be dependent on any part of the request, correct?

robertknight commented 6 years ago

Indeed - well except for things like the DB or Elasticsearch connection which don't vary depending on the user.

seanh commented 6 years ago

Totally agree that request.authority sounds like the authority of the current request and is a super misleading name.

I was going to suggest getting rid of request.authority altogether and always accessing it as a deployment setting (request.registry.settings["h.authority"]) as that makes it clear that it's a deployment setting rather than "the authority of the current request" and there wouldn't be a problem.

But I don't think the default value (request.domain) is available at the time when deployment setting default values are defined, there's no request in config.py, I believe it runs at application startup time not at request processing time (which makes sense as they're deployment settings) :shrug:

Regarding the name default_authority: I think you've got inconsistent naming: the deployment setting is called "h.authority", the environment variable AUTHORITY, but what it does is control a request property named default authority. It's also weird that if you override the default authority (request.domain) with a custom AUTHORITY envvar, so now you're not using the default value of the "h.authority" / AUTHORITY setting, then it still sets a request property named default authority. In other words you've got two different "defaults" floating around here. So maybe there's a better name like site_authority or something?

For the new request.authority (the authority of the authenticated user or authclient) I wonder whether request.user.authority could be used instead? request.user is the models.User for the authenticated user. Or request.auth_client.authority in the case of an authenticated auth client. Then you might still want a request.authority so that code can access it in a consistent way without caring whether it's a user or an authclient, but maybe that request.authority property could be implemented by first looking at request.user and then looking at request.auth_client? Such that it's never possible for request.authority and request.user.authority to contain different values. This is probably a later refactoring. We can definitely ignore this paragraph for now.

robertknight commented 6 years ago

I was going to suggest getting rid of request.authority altogether and always accessing it as a deployment setting (request.registry.settings["h.authority"])

Looking at how often this property is accessed in app code and tests, I think this would be a bit verbose. I suspect that's why a shortcut was introduced originally.

So maybe there's a better name like site_authority or something?

I get your point. However it doesn't seem to me that there is that much risk of confusing the concepts of default value for the "authority" deployment setting and the default authority for new users and groups created in the system.

What definitely does cause confusion is the status quo of request.authority where the request does logically have some authority associated with it via the authenticated user or auth client, but that's not what request.authority returns.

seanh commented 6 years ago

I was going to suggest getting rid of request.authority altogether and always accessing it as a deployment setting (request.registry.settings["h.authority"])

Looking at how often this property is accessed in app code and tests, I think this would be a bit verbose. I suspect that's why a shortcut was introduced originally.

I wouldn't be in favour of such a shortcut. request.registry.settings["name"] is how you access deployment settings in Pyramid, whether we think it's verbose or not. The way to grep the code for all uses of a given deployment setting would be to grep for settings["name"]. If we have one deployment setting that, unlike every other setting, can sometimes be accessed as settings["name"] and sometimes as request.name, that's going to make the code difficult to work with and lead to bugs.

It's not a shortcut though - request.authority was introduced as a request property because its default value (request.domain) isn't available at the time when deployment setting default values are defined, so request.authority can't be a config setting. So to be clear there are two separate things here:

  1. request.registry.settings["h.authority"], an optional config setting that doesn't have a default value.
  2. request.authority (now being renamed to request.default_authority), a request property that's either the value of request.registry.settings["h.authority"] or, if "h.authority" isn't set, falls back to request.domain (a request property that's only available during request processing time, not during application config time). When it falls back to this value then request.authority will be a different value from "h.authority".

request.authority isn't a shortcut for request.registry.settings["h.authority"]), the two don't always have the same value.

Now that I type that out I think the right thing to do here for code simplicity and non-surprisingness, to make authority a simple config setting like any other rather than a config setting plus a request property that's partially based on that config setting, would have been to:

  1. Make request.registry.settings["h.authority"] a required config setting, still with no default value
  2. Get rid of request.authority, just always access it as request.registry.settings["h.authority"], since "h.authority" is now a required setting the fallback / default value of request.domain is no longer needed.

request.registry.settings["h.authority"] is verbose but that's how you read config settings in Pyramid, and it's dirt simple. I think the extreme amount of confusion introduced by request.authority, and the still-confusing and unique relationship between request.default_authority, request.domain and settings["h.authority"], speaks for itself.