opensciences / opensciences.github.io

Website for OpenScience -
http://openscience.us
MIT License
26 stars 18 forks source link

An Exploratory Study of the Pull-Based Software Development Model #137

Closed rahlk closed 9 years ago

rahlk commented 9 years ago

Link to paper http://dl.acm.org/ft_gateway.cfm?id=2568260&ftid=1467975&dwn=1&CFID=609849406&CFTOKEN=18730546 Link to data (Link to a paper that contains the data) - G. Gousios. The GHTorrent dataset and tool suite. In Proceedings of MSR ’13, May 2013.

BenProvince commented 9 years ago

Context Notes

Authors

Georgios Gousios G.Gousios@tudelft.nl Martin Pinzger martin.pinzger@aau.at Arie van Deursen Arie.vandeursen@tudelft.nl

Data

This is a collection of database dumps from github conducted at various dates. There are multiple options for download.

The following tables are available: commit comments, commits, events, followers, forks, issue comments, issue events, issues, org members, pull request comments, pull requests, repo collaborators, repo labels, repos, users, watchers

Option 1: Reduced MSR Challenge Data (~100MB)

This is a reduced version of the original dataset in which only the top-10 stared github projects are reported. This MSR'14 challenge data is available as a MongoDB or MySQL database. The data, description, and instructions are available at: http://www.ghtorrent.org/msr14.html

Option 2: Full Database Dumps (~16GB)

Full dumps by date available at: http://www.ghtorrent.org/downloads.html

Option 3: Web Query (~0MB)

Query the most recent DB live rather than downloading the whole thing at: http://www.ghtorrent.org/dblite/

Option 4: Individual Table Torrents (1MB~10GB)

Get a torrent of just the table you want for the dump-date you want at: http://www.ghtorrent.org/downloads.html