niccokunzmann / first_timer_scraper

Find pull-requests and issues of first time contributors
http://firsttimers.quelltext.eu
GNU Affero General Public License v3.0
9 stars 15 forks source link

first_timer_scraper

Visit the daily build.

This is inspired by first-timers-only issues: How can we make it possible for new-comers to contribute to a project.

This web service tries to solve this by looking at the data:

First Timer Contributions

To find first timer pull-requests, we look at organizations and their repositories.

Implementation

For each organization submitted:

  1. submit all repositories

For each submitted repository

  1. clone
  2. check if it can be a first-timer repository
  3. extract for all commits
    • author email
    • author name
    • commit hash
    • time
  4. Get all pull-requests
    • Get the first commit
    • Get the author, email
    • Get the GitHub user issuing this pull-request
  5. find first-timer pull-requests

API

ENDING is either .html or .json.

Minimal definitions

When objects are defined, they contain minimal definitions. They can be used to infer the most important data and find the full data. E.g. repository["urls"]["json"] always points to the repository endpoint.

repository, user, organization and issue have this in common:

{
  "name": "<name>",
  "urls": {
    "html": "<html_url>",
    "json": "<json_url>",
    "github_html": "<github_html_url>",
    "github_api": "<github_json_url>",
  },
  "last_update": "<start_time>",
  }
}

Command Line

python3 -m first_timer_scraper <CACHE_FOLDER> <SECRETS_FOLDER> <MODEL_FOLDER>

Installation

You need to install Python 3 and pip. Under Ubuntu, you can do this:

sudo apt-get -y install python3 python3-pip

To install all required packages, execute

pip3 install --user -r requirements.txt

Windows

py -3 -m pip install --user -r requirements.txt
py -3 -m first_timer_scraper.app data secret model

Docker

This runs the docker container:

docker run --rm                                \
           -p 8080:8080                        \
           -v "secret:/app/secret"             \
           -v "model:/app/model"               \
           niccokunzmann/first_timer_scraper

The parameters have the following meaning:

When you ran the command, you can visit http://localhost:8080, submit credentials and scrape repositories.

You can build the Docker image like this:

docker build . -t niccokunzmann/first_timer_scraper

Deployment

Deployment takes up to 24 hours from a merge into master:

  1. A commit is added to master.
  2. An automated build is started.
  3. The application is deployed around 4:00 GMT

Data Model

The data model describes what is saved when scraped.

{
  "loklak": {
    "repos": {
      "loklak_server": {
        "first_timer_prs": {
          "112": "contributor1"
        },
        "last_update_requested": "2011-01-26T19:01:12Z"
      }
    },
    "last_update_requested": "2011-01-26T19:01:12Z",
    "first_timer_prs":{}
  },
  "contributor1": {
    "first_timer_prs": {
      "loklak/loklak_server": {
        "created_at" : "2011-01-26T19:01:12Z"
        "number": 112 // lowest number wins
        "last_update_requested": "2011-01-26T19:01:12Z"
      }
    },
    "last_update_requested": "2011-01-26T19:01:12Z",
    "repos": {}
  }
}

Further Reading