ms609 / citation-bot

Citation bot is a tool to expand and format references at Wikipedia. It retrieves citation data from a variety of sources including CrossRef (DOI), PMID, PMC and JSTOR, and returns a formatted citation. Report bugs at https://en.wikipedia.org/wiki/User_talk:Citation_bot
https://en.wikipedia.org/wiki/User:Citation_bot/use
GNU General Public License v3.0
54 stars 23 forks source link

Add OAuth to Citation Bot web interface #1767

Closed kaldari closed 5 years ago

kaldari commented 5 years ago

Some documentation on how to do this:

GlazerMann commented 5 years ago

The code in the pull is a ripoff of that code.

kaldari commented 5 years ago

Oh nice! Glad to see this has already been started. What remains to be done on that patch? I'm a bit confused by "Needs to upgrade the exising OAuth tokens so that they can also get user tokens at https://meta.wikimedia.org/wiki/Special:OAuthConsumerRegistration logged in as Citation Bot Account." Wouldn't we just want the edits to actually be executed by the user's account, similar to how IABot works for user-activated edits: https://tools.wmflabs.org/iabot ?

GlazerMann commented 5 years ago

The bot needs a key in order to Request the user key. Can’t just randomly hand out keys

kaldari commented 5 years ago

Have you already requested the key/token for a new OAuth consumer?

kaldari commented 5 years ago

Or are you waiting for @ms609 to upgrade the existing token?

GlazerMann commented 5 years ago

The token has to be requested by the citation bot user

kaldari commented 5 years ago

Ah, sorry, now I understand!

davidbarratt commented 5 years ago

I think the callback URL is wrong: https://meta.wikimedia.org/w/index.php?title=Special:OAuthListConsumers/view/eab49cdb931e67a287f7a432fa185c96&name=&publisher=Citation+bot&stage=-1

It's a "prefix" so it should be: https://tools.wmflabs.org and check the box to allow the callback url to be customized (it is still required to have that prefix).

davidbarratt commented 5 years ago

You can set the callback url to: https://tools.wmflabs.org/citations/ but you'll have to have a different one for the -dev tool, so I would set it to https://tools.wmflabs.org

Regardless, it should be customizable as long as it has that prefix. :)

GlazerMann commented 5 years ago

Should it be https://tools.wmflabs.org/citations/authenticate.php Just wondering?

davidbarratt commented 5 years ago

Should it be https://tools.wmflabs.org/citations/authenticate.php Just wondering?

That works too, but you'll need a different one for -dev. Regardless, you'll need to allow customization for the redirect back to the query to work properly. :)

ms609 commented 5 years ago

I've added the secret for Consumer token 37f0c5730192fda6439be62852f90421 to Travis using the environment variable PHP_WP_OAUTH_SECRET.

Probably I should replace existing values - will do things properly when not in a tearing hurry! Hopefully this allows you to proceed with testing for the time being...

davidbarratt commented 5 years ago

@ms609 & @GlazerMann The checkbox for "Allow consumer to specify a callback in requests and use "callback" URL above as a required prefix." should be checked (i.e. this page should say Yes under that section). Otherwise it looks great to me. :)

ms609 commented 5 years ago

New token 6d82ca4be4e35198f416dc0435e101ea is now active:

OAuth "callback" URL: https://tools.wmflabs.org/citations/authenticate.php

Allow consumer to specify a callback in requests and use "callback" URL above as a required prefix.: ticked

Applicable grants: Edit existing pages

Allowed IP ranges: Travis IPs.

I've updated PHP_WP_OAUTH_CONSUMER and PHP_WP_OAUTH_SECRET on Travis to the new values for this key.

davidbarratt commented 5 years ago

@ms609 Could you create another one that allows all IPs and put that on production?

ms609 commented 5 years ago

I have proposed token 7daa1b8fcc04fb584da0a76baf115f5c and added this to the production site's environment variables.

davidbarratt commented 5 years ago

There's a problem with the OAuth Redirect protocol, #1794 should fix that problem.

davidbarratt commented 5 years ago

FYI: You'll need to have an "Owner-only consumer" for Travis because the CI cannot open up a web browser to follow the authentication flow. (well I guess you could but that seems like overkill).

GlazerMann commented 5 years ago

I just did it on a single Page and It worked.

GlazerMann commented 5 years ago

Do we need to come up an emergency shutdown procedure not that the bot is unblockable

GlazerMann commented 5 years ago

Should this be changed to still write “as the bot”, but with the user authenticated??

davidbarratt commented 5 years ago

Should this be changed to still write “as the bot”, but with the user authenticated??

I don't think so. Blocked users can (for the most part) still authenticate.

davidbarratt commented 5 years ago

Do we need to come up an emergency shutdown procedure not that the bot is unblockable

Well... the consumer can be revoked at any time by a steward on meta.

ms609 commented 5 years ago

FYI: You'll need to have an "Owner-only consumer" for Travis because the CI cannot open up a web browser to follow the authentication flow. (well I guess you could but that seems like overkill).

Does this mean that the original tokens will suffice for Travis testing? Is it possible to have a CI test of the authentication protocol?

ms609 commented 5 years ago

Should this be changed to still write “as the bot”, but with the user authenticated?

I'd strongly feel that it should be the bot making the edits; I can't imagine that it should be too difficult to check whether a user is blocked via the API? Otherwise a user's contributions will be flooded if they use e.g. Category.php, and edits might not be flagged as bot edits.

GlazerMann commented 5 years ago

I have rolled the code back so that the bot uses its tokens to write after it verifies the username is who they claim they are and they are not blocked/banned/etc. Did the tokens change, since I now get:

! API call failed: The authorization headers in your request are not valid: No approved grant was found for that authorization token.

ms609 commented 5 years ago

I've not changed the tokens since my last comment regarding them. Presumably you will need to edit with the bot's credentials rather than the new _WP_ ones .

GlazerMann commented 5 years ago

We probably need both!

davidbarratt commented 5 years ago

I'd strongly feel that it should be the bot making the edits; I can't imagine that it should be too difficult to check whether a user is blocked via the API? Otherwise a user's contributions will be flooded if they use e.g. Category.php, and edits might not be flagged as bot edits.

You can check to see if a user is blocked from a particular page via an API request like this: https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info&titles=Oulu%20University%20Hospital&formatversion=2&intestactions=edit Note the intestactions parameter. This will return a boolean under actions and edit if the user can edit that page.

Also note: both the user editing and the bot will need to be checked (if making edits under the bot's account). (i.e. you'll need to make the request twice, once under the bot's credentials and again under the user's credentials). Also, you'll need to do this on a per page basis because of partial blocks (though, as you can see from the example, the API will accept multiple pages in the same request).

I have rolled the code back so that the bot uses its tokens to write after it verifies the username is who they claim they are and they are not blocked/banned/etc. Did the tokens change, since I now get:

! API call failed: The authorization headers in your request are not valid: No approved grant was found for that authorization token.

I think there might be some confusion on what the tokens are (and, to be far, we use that word a lot ot mean different things).

There are "access" tokens in OAuth which are added to the Authorization header of the request (the library does this for you). This tells MediaWiki to perform the task under that user's account.

Then there are "csrf" tokens (or somtimes refered to as edit tokens). These tokens provide CSRF protection when using cookie authentication with the API.

When using OAuth (or the Authorization header in general) the CSRF tokens are not required. However, MediaWiki does require them. There is a task to remove them since they do not provide any additional security bennefits and just cause more confusion: https://phabricator.wikimedia.org/T126257

Since we do need them right now, the access token and the csrf token must belong to the same user. That means if you request the token with the bot account, then that token can only be used for subsiquent requests with the bot account.

Basically, make sure the CSRF and access tokens always belong to the same account when making a request.

Does that help? I hope I've provided more clarity. Please let me know if there's anything I can do to help. :)

davidbarratt commented 5 years ago

errr.. I suppose you don't have to check the blocked status of the bot, because if the bot is blocked, the edit will fail anyways.

GlazerMann commented 5 years ago

Actions to take: Set BOTH sets of tokens as environment variables Go through code and split ‘token’ into ‘bot edit token’ and ‘verify on meta token’

ms609 commented 5 years ago

FWIW, I've not removed any tokens from the environment variables.

GlazerMann commented 5 years ago

Need to check if user is okay on each and every page 🙁 Current code just checks for full block

GlazerMann commented 5 years ago

Not sure why it’s not working. I think I got all the tokens straight

GlazerMann commented 5 years ago

@ms609 is there an old token that predates this? It looks like we need three token sets?

davidbarratt commented 5 years ago

Need to check if user is okay on each and every page 🙁 Current code just checks for full block

Yes. Unfortunately, since partial blocks (I imagine) will be deployed to English Wikipedia in the future, then it's possible for a user to be blocked from a specific page or namespace.

@ms609 is there an old token that predates this? It looks like we need three token sets?

You should only need access tokens (owner-only consumer) for Bot account. (set in the environment variables). The users will provide their own access tokens through the authentication flow. The edit (csrf) tokens should be retrieved with the API.

I hope this helps!

GlazerMann commented 5 years ago

I am confused as to why the bot does not work right now. The only changes now from before is the call to Authenticate user.

GlazerMann commented 5 years ago

I see that https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info&titles=Main_Page&formatversion=2&intestactions=edit for example shows that I am banned from editing the main page. But this api Call has to be done as the user in question.

davidbarratt commented 5 years ago

I am confused as to why the bot does not work right now. The only changes now from before is the call to Authenticate user.

What OAuth Consumer are you using for the Bot? It should be an Owner-only consumer that is created with the bots' account (this type of consumer does not require approval).

To clarify, you'll have these pieces of information in env vars:

"Consumer" in this context is anoalogous to "application". Basically the application has tokens and the user performing the action has tokens. All of these are hashed together and sent accross the wire. When the user uses OAuth, they are authenticating their user account and the app they are using. This way, access can be controlled on a per-application (consumer) and per-user basis.

Alternatively, for the bot, you could use cookies (like it was before). In that scenario, the users would use OAuth (with the general token that is there) and the bot would use cookies (no OAuth). That's totally fine to do it that way.

I see that https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info&titles=Main_Page&formatversion=2&intestactions=edit for example shows that I am banned from editing the main page. But this api Call has to be done as the user in question.

Yes. If you make the request with the user's OAuth credentails (the token and secret that is in their session) it will give you the result for their account.

GlazerMann commented 5 years ago

Right now there are four setenv that I know off. Obviously I need six. Which means I need the old ones

GlazerMann commented 5 years ago

I have split out the tokens into six environement variables: https://github.com/ms609/citation-bot/pull/1817 @ms609

ms609 commented 5 years ago

The original env vars remain set. Here's the complete list (pertaining to Oauth). Probably I should rename the WP ones now that I understand them better...

Owner-only (for the bot): Consumer Token Consumer Secret Access Token Access Secret PHP_OAUTH_ACCESS_SECRET
PHP_OAUTH_ACCESS_TOKEN PHP_OAUTH_CONSUMER_SECRET
PHP_OAUTH_CONSUMER_TOKEN

General (for the users): - new tokens with WP Consumer Token Consumer Secret PHP_WP_OAUTH_CONSUMER
PHP_WP_OAUTH_SECRET

GlazerMann commented 5 years ago

@ms609 I think I have them all fixed, but this code errors on wikipedia:

if (!getenv('PHP_WP_OAUTH_CONSUMER') || !getenv('PHP_WP_OAUTH_SECRET')) { echo("Citation Bot's authorization tokens not configured"); exit(1); }

davidbarratt commented 5 years ago

The only two consumers I see on production are: eab49cdb931e67a287f7a432fa185c96 and 7daa1b8fcc04fb584da0a76baf115f5c

eab49cdb931e67a287f7a432fa185c96 will not work for the bot account (it is not owner-only) and it will not work for users (it has the wrong callback url prefix).

7daa1b8fcc04fb584da0a76baf115f5c will work for the users, but not for the bot account.

It looks like you need an owner-only consumer created for the bot account (it must be created from the bot's user account). If you happen to have the old credentials, I believe those should work, but I didn't see them on prod or dev. It's fine to create a new one. :)

GlazerMann commented 5 years ago

@davidbarratt — do the getenv in the code look right now? Ignoring what the tokens are actually set to.

ms609 commented 5 years ago

When I log in as Citation_bot, I see that the eab49cdb931e67a287f7a432fa185c96 consumer listed as owner-only and as such has no callback URL. This is the key that we have been using to edit with the bot account, with no issues in the past.

GlazerMann commented 5 years ago

right now the code says: in authenticate.php: new Consumer(getenv('PHP_WP_OAUTH_CONSUMER'), getenv('PHP_WP_OAUTH_SECRET')) and in authenticate_user(): new Consumer(getenv('PHP_WP_OAUTH_CONSUMER'), getenv('PHP_WP_OAUTH_SECRET')) And for bot edit tokens: new Consumer(getenv('PHP_OAUTH_CONSUMER_TOKEN'), getenv('PHP_OAUTH_CONSUMER_SECRET')) new Token(getenv('PHP_OAUTH_ACCESS_TOKEN'), getenv('PHP_OAUTH_ACCESS_SECRET'))

davidbarratt commented 5 years ago

When I log in as Citation_bot, I see that the eab49cdb931e67a287f7a432fa185c96 consumer listed as owner-only and as such has no callback URL. This is the key that we have been using to edit with the bot account, with no issues in the past.

OOOO! ok, I see it in the log at the bottom of the page: https://meta.wikimedia.org/wiki/Special:OAuthListConsumers/view/eab49cdb931e67a287f7a432fa185c96

davidbarratt commented 5 years ago

I updated the env vars on production and it mostly works: https://en.wikipedia.org/w/index.php?title=Orlando,_Florida&diff=prev&oldid=903262163

I don't see anything that is checking the user's block (on a per-page basis). There is a risk with the way this is that someone can invoke the bot on a page they don't have permission to edit. If that's ok, then great! if not, then it should be updated to prevent that. :)

kaldari commented 5 years ago

Looks like the signature in the edit summary is broken though.

At some point the bot should be updated to check for blocks, but this is at least a big improvement on the status quo.