openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
461 stars 74 forks source link

Scraper get's stuck in creation, authentication error is the cause, but user has permissions to create #1122

Open henare opened 7 years ago

henare commented 7 years ago

Backtrace

line 34 of [PROJECT_ROOT]/lib/morph/github.rb: create_repository
line 14 of [PROJECT_ROOT]/app/workers/create_scraper_worker.rb: perform

View full backtrace and more info at honeybadger.io

equivalentideas commented 7 years ago

Recently there have been lots of reports of people’s scrapers getting stuck in the process of being created.

On the backend, we're seeing "Octokit::Forbidden: POST https://api.github.com/orgs/everypolitician-scrapers/repos: 403 - You need admin access to the organization before adding a repository to it. // See: https://developer.github.c..."

@henare documented this in the forum previously.

This seems to be particularly common with the everypolitician-scrapers github org, but that could just be because not many people are making scrapers with the owners set as an organization.

The most recent batch of these errors seem to start on the 31st of January 2017. ![Uploading Screen Shot 2017-02-07 at 3.49.11 pm.png…]()

equivalentideas commented 7 years ago

Currently our only known way to unstick these scrapers is to delete them. This means the user can try and create them again, which won't necessarily work.

@chris48s seems to have had success by creating the repos for the scrapers in github first, and then adding them to morph. So the issue seems specific to the repo creation process, as the error suggests.

tmtmtmtm commented 7 years ago

I always create my scrapers in Github first, including with the one I tweeted last week, so there are definitely occasional issues the other way around too. (However that problem is exceptionally rare for me)

chris48s commented 7 years ago

I think there are actually 2 issues which arise from this bug report:

  1. As noted, there are at least some users who aren't able to create scrapers in the namespace of a github organisation, even if we have the owner role in that org.

  2. If an error occurs in the process of creating a scraper, the create process does not fail gracefully. This leaves the scraper in an inconsistent state where the user can't delete it and requires admin intervention. It would be helpful if a failure during the creation process triggered some kind of rollback/cleanup. Presumably there are other situations where Github's API might throw a non-200 status code, or some other failure might happen.

handelaar commented 7 years ago

Creating a new scraper on behalf of an org from within morph has overnight left me stuck with https://morph.io/planningalerts-ie/carlow dead and undeleteable. Cloning over at github and creating from github URL, on the other hand, works fine.

chris48s commented 7 years ago

Additional notes on this. If it happens, you can fix it by:

Not sure, but I think maybe this happens if you join or create an organisation after you created your morph account and it doesn't automatically give it permission or something?

handelaar commented 7 years ago

Update from Slack: https://oaf.slack.com/archives/C41SVKAQL/p1498683092992253

The problem seems to be that morph has access to any of your github orgs which were created before your morph account, and to none that were created after. For that reason, morph either

  1. Shouldn't be showing orgs on your morph account which are newer than your account at all OR
  2. Should be asking for (or maybe dropping-and-reobtaining) oauth permissions on your github account if it ever sees an org on your github account that's newer than the first Oauth token it acquired.

I imagine the former is way easier to do quickly and prevents the regular need for manual OAF intervention.

Having done @chris48s's thing, the permission was indeed absent, and I can indeed now create an org-owned repo from inside morph. So the cause of this is now identified.

(Edit: Chris and I have posted here across each other, it seems.)

(Edit: This doesn't of course actually fix any 'zombie' failed-to-create scrapers that currently exist, but it does prevent the creation of new ones)

IHIutch commented 2 years ago

Following up here. This repo looks active... But I'm getting the same issues today. Thanks