Closed kaldari closed 5 years ago
The code in the pull is a ripoff of that code.
Oh nice! Glad to see this has already been started. What remains to be done on that patch? I'm a bit confused by "Needs to upgrade the exising OAuth tokens so that they can also get user tokens at https://meta.wikimedia.org/wiki/Special:OAuthConsumerRegistration logged in as Citation Bot Account." Wouldn't we just want the edits to actually be executed by the user's account, similar to how IABot works for user-activated edits: https://tools.wmflabs.org/iabot ?
The bot needs a key in order to Request the user key. Can’t just randomly hand out keys
Have you already requested the key/token for a new OAuth consumer?
Or are you waiting for @ms609 to upgrade the existing token?
The token has to be requested by the citation bot user
Ah, sorry, now I understand!
I think the callback URL is wrong: https://meta.wikimedia.org/w/index.php?title=Special:OAuthListConsumers/view/eab49cdb931e67a287f7a432fa185c96&name=&publisher=Citation+bot&stage=-1
It's a "prefix" so it should be: https://tools.wmflabs.org and check the box to allow the callback url to be customized (it is still required to have that prefix).
You can set the callback url to: https://tools.wmflabs.org/citations/ but you'll have to have a different one for the -dev
tool, so I would set it to https://tools.wmflabs.org
Regardless, it should be customizable as long as it has that prefix. :)
Should it be https://tools.wmflabs.org/citations/authenticate.php Just wondering?
Should it be https://tools.wmflabs.org/citations/authenticate.php Just wondering?
That works too, but you'll need a different one for -dev
. Regardless, you'll need to allow customization for the redirect back to the query to work properly. :)
I've added the secret for Consumer token 37f0c5730192fda6439be62852f90421 to Travis using the environment variable PHP_WP_OAUTH_SECRET.
Probably I should replace existing values - will do things properly when not in a tearing hurry! Hopefully this allows you to proceed with testing for the time being...
@ms609 & @GlazerMann The checkbox for "Allow consumer to specify a callback in requests and use "callback" URL above as a required prefix." should be checked (i.e. this page should say Yes under that section). Otherwise it looks great to me. :)
New token 6d82ca4be4e35198f416dc0435e101ea is now active:
OAuth "callback" URL: https://tools.wmflabs.org/citations/authenticate.php
Allow consumer to specify a callback in requests and use "callback" URL above as a required prefix.: ticked
Applicable grants: Edit existing pages
Allowed IP ranges: Travis IPs.
I've updated PHP_WP_OAUTH_CONSUMER
and PHP_WP_OAUTH_SECRET
on Travis to the new values for this key.
@ms609 Could you create another one that allows all IPs and put that on production?
I have proposed token 7daa1b8fcc04fb584da0a76baf115f5c and added this to the production site's environment variables.
There's a problem with the OAuth Redirect protocol, #1794 should fix that problem.
FYI: You'll need to have an "Owner-only consumer" for Travis because the CI cannot open up a web browser to follow the authentication flow. (well I guess you could but that seems like overkill).
I just did it on a single Page and It worked.
Do we need to come up an emergency shutdown procedure not that the bot is unblockable
Should this be changed to still write “as the bot”, but with the user authenticated??
Should this be changed to still write “as the bot”, but with the user authenticated??
I don't think so. Blocked users can (for the most part) still authenticate.
Do we need to come up an emergency shutdown procedure not that the bot is unblockable
Well... the consumer can be revoked at any time by a steward on meta.
FYI: You'll need to have an "Owner-only consumer" for Travis because the CI cannot open up a web browser to follow the authentication flow. (well I guess you could but that seems like overkill).
Does this mean that the original tokens will suffice for Travis testing? Is it possible to have a CI test of the authentication protocol?
Should this be changed to still write “as the bot”, but with the user authenticated?
I'd strongly feel that it should be the bot making the edits; I can't imagine that it should be too difficult to check whether a user is blocked via the API? Otherwise a user's contributions will be flooded if they use e.g. Category.php, and edits might not be flagged as bot edits.
I have rolled the code back so that the bot uses its tokens to write after it verifies the username is who they claim they are and they are not blocked/banned/etc. Did the tokens change, since I now get:
! API call failed: The authorization headers in your request are not valid: No approved grant was found for that authorization token.
I've not changed the tokens since my last comment regarding them. Presumably you will need to edit with the bot's credentials rather than the new _WP_
ones .
We probably need both!
I'd strongly feel that it should be the bot making the edits; I can't imagine that it should be too difficult to check whether a user is blocked via the API? Otherwise a user's contributions will be flooded if they use e.g. Category.php, and edits might not be flagged as bot edits.
You can check to see if a user is blocked from a particular page via an API request like this:
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info&titles=Oulu%20University%20Hospital&formatversion=2&intestactions=edit
Note the intestactions
parameter.
This will return a boolean under actions
and edit
if the user can edit that page.
Also note: both the user editing and the bot will need to be checked (if making edits under the bot's account). (i.e. you'll need to make the request twice, once under the bot's credentials and again under the user's credentials). Also, you'll need to do this on a per page basis because of partial blocks (though, as you can see from the example, the API will accept multiple pages in the same request).
I have rolled the code back so that the bot uses its tokens to write after it verifies the username is who they claim they are and they are not blocked/banned/etc. Did the tokens change, since I now get:
! API call failed: The authorization headers in your request are not valid: No approved grant was found for that authorization token.
I think there might be some confusion on what the tokens are (and, to be far, we use that word a lot ot mean different things).
There are "access" tokens in OAuth which are added to the Authorization
header of the request (the library does this for you). This tells MediaWiki to perform the task under that user's account.
Then there are "csrf" tokens (or somtimes refered to as edit
tokens). These tokens provide CSRF protection when using cookie authentication with the API.
When using OAuth (or the Authorization
header in general) the CSRF
tokens are not required. However, MediaWiki does require them. There is a task to remove them since they do not provide any additional security bennefits and just cause more confusion: https://phabricator.wikimedia.org/T126257
Since we do need them right now, the access token and the csrf token must belong to the same user. That means if you request the token with the bot account, then that token can only be used for subsiquent requests with the bot account.
Basically, make sure the CSRF and access tokens always belong to the same account when making a request.
Does that help? I hope I've provided more clarity. Please let me know if there's anything I can do to help. :)
errr.. I suppose you don't have to check the blocked status of the bot, because if the bot is blocked, the edit will fail anyways.
Actions to take: Set BOTH sets of tokens as environment variables Go through code and split ‘token’ into ‘bot edit token’ and ‘verify on meta token’
FWIW, I've not removed any tokens from the environment variables.
Need to check if user is okay on each and every page 🙁 Current code just checks for full block
Not sure why it’s not working. I think I got all the tokens straight
@ms609 is there an old token that predates this? It looks like we need three token sets?
Need to check if user is okay on each and every page 🙁 Current code just checks for full block
Yes. Unfortunately, since partial blocks (I imagine) will be deployed to English Wikipedia in the future, then it's possible for a user to be blocked from a specific page or namespace.
@ms609 is there an old token that predates this? It looks like we need three token sets?
You should only need access tokens (owner-only consumer) for Bot account. (set in the environment variables). The users will provide their own access tokens through the authentication flow. The edit (csrf) tokens should be retrieved with the API.
I hope this helps!
I am confused as to why the bot does not work right now. The only changes now from before is the call to Authenticate user.
I see that https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info&titles=Main_Page&formatversion=2&intestactions=edit for example shows that I am banned from editing the main page. But this api Call has to be done as the user in question.
I am confused as to why the bot does not work right now. The only changes now from before is the call to Authenticate user.
What OAuth Consumer are you using for the Bot? It should be an Owner-only consumer that is created with the bots' account (this type of consumer does not require approval).
To clarify, you'll have these pieces of information in env vars:
"Consumer" in this context is anoalogous to "application". Basically the application has tokens and the user performing the action has tokens. All of these are hashed together and sent accross the wire. When the user uses OAuth, they are authenticating their user account and the app they are using. This way, access can be controlled on a per-application (consumer) and per-user basis.
Alternatively, for the bot, you could use cookies (like it was before). In that scenario, the users would use OAuth (with the general token that is there) and the bot would use cookies (no OAuth). That's totally fine to do it that way.
I see that https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info&titles=Main_Page&formatversion=2&intestactions=edit for example shows that I am banned from editing the main page. But this api Call has to be done as the user in question.
Yes. If you make the request with the user's OAuth credentails (the token and secret that is in their session) it will give you the result for their account.
Right now there are four setenv that I know off. Obviously I need six. Which means I need the old ones
I have split out the tokens into six environement variables: https://github.com/ms609/citation-bot/pull/1817 @ms609
The original env vars remain set. Here's the complete list (pertaining to Oauth). Probably I should rename the WP ones now that I understand them better...
Owner-only (for the bot):
Consumer Token
Consumer Secret
Access Token
Access Secret
PHP_OAUTH_ACCESS_SECRET
PHP_OAUTH_ACCESS_TOKEN
PHP_OAUTH_CONSUMER_SECRET
PHP_OAUTH_CONSUMER_TOKEN
General (for the users): - new tokens with WP
Consumer Token
Consumer Secret
PHP_WP_OAUTH_CONSUMER
PHP_WP_OAUTH_SECRET
@ms609 I think I have them all fixed, but this code errors on wikipedia:
if (!getenv('PHP_WP_OAUTH_CONSUMER') || !getenv('PHP_WP_OAUTH_SECRET')) { echo("Citation Bot's authorization tokens not configured"); exit(1); }
The only two consumers I see on production are: eab49cdb931e67a287f7a432fa185c96 and 7daa1b8fcc04fb584da0a76baf115f5c
eab49cdb931e67a287f7a432fa185c96 will not work for the bot account (it is not owner-only) and it will not work for users (it has the wrong callback url prefix).
7daa1b8fcc04fb584da0a76baf115f5c will work for the users, but not for the bot account.
It looks like you need an owner-only consumer created for the bot account (it must be created from the bot's user account). If you happen to have the old credentials, I believe those should work, but I didn't see them on prod or dev. It's fine to create a new one. :)
@davidbarratt — do the getenv in the code look right now? Ignoring what the tokens are actually set to.
When I log in as Citation_bot, I see that the eab49cdb931e67a287f7a432fa185c96
consumer listed as owner-only and as such has no callback URL. This is the key that we have been using to edit with the bot account, with no issues in the past.
right now the code says: in authenticate.php: new Consumer(getenv('PHP_WP_OAUTH_CONSUMER'), getenv('PHP_WP_OAUTH_SECRET')) and in authenticate_user(): new Consumer(getenv('PHP_WP_OAUTH_CONSUMER'), getenv('PHP_WP_OAUTH_SECRET')) And for bot edit tokens: new Consumer(getenv('PHP_OAUTH_CONSUMER_TOKEN'), getenv('PHP_OAUTH_CONSUMER_SECRET')) new Token(getenv('PHP_OAUTH_ACCESS_TOKEN'), getenv('PHP_OAUTH_ACCESS_SECRET'))
When I log in as Citation_bot, I see that the
eab49cdb931e67a287f7a432fa185c96
consumer listed as owner-only and as such has no callback URL. This is the key that we have been using to edit with the bot account, with no issues in the past.
OOOO! ok, I see it in the log at the bottom of the page: https://meta.wikimedia.org/wiki/Special:OAuthListConsumers/view/eab49cdb931e67a287f7a432fa185c96
I updated the env vars on production and it mostly works: https://en.wikipedia.org/w/index.php?title=Orlando,_Florida&diff=prev&oldid=903262163
I don't see anything that is checking the user's block (on a per-page basis). There is a risk with the way this is that someone can invoke the bot on a page they don't have permission to edit. If that's ok, then great! if not, then it should be updated to prevent that. :)
Looks like the signature in the edit summary is broken though.
At some point the bot should be updated to check for blocks, but this is at least a big improvement on the status quo.
Some documentation on how to do this: