ohbm / hackathon2019

Website and projects for the OHBM Hackathon in Rome 2019
https://ohbm.github.io/hackathon2019
81 stars 61 forks source link

survey of github repos in pubmed #86

Open riddlet opened 5 years ago

riddlet commented 5 years ago

Survey of github repos in pubmed

Project Description

Github is increasingly becoming a tool of choice for computational-oriented research. There have been plenty of efforts to get scientists to make use of this valuable tool, but to my awareness there hasn't been any attempt to evaluate how scientists use this resource.

I pulled full-texts from pubmed and searched them for the presence of the string 'github' and found a bit over 20k papers that contained this string. Using these texts plus the github api I think we could provide some insight to the following:

1) How many scientific repos contain a README? 2) How often do often to repos contain files that are likely to be data (e.g. csv, json) 3) What are the most popular types of analytic scripts (e.g. .py, .R, .ipynb, etc)? 4) How do the above vary by research area?

Skills required to participate

ideally, experience with python or R & web-based APIs. Text analysis experience would be helpful.

Integration

TBA

Preparation material

BioC API

github API

Link to your GitHub repo

https://github.com/riddlet/gitpubs

Communication

We have a mattermost channel at https://mattermost.brainhack.org/brainhack/channels/gitpubs

fedeadolfi commented 5 years ago

Interesting. Last year at Neurohackademy a bunch of us worked on a related project - you might find the repo useful at some point.

https://github.com/srcole/o-factor

dcmoyer commented 5 years ago

Hello! Day one we're on the top floor of the Mercado Centrale, sitting at one of the center tables. =)