It would be good to acknowledge those contributors who write documentation as part of a pull request.
What could we use the data for?
We could use that information to see what the ramp-up time is for newcomers to start writing documentation. It would also be useful to see what percentage of core contributors write documentation. The data could be used to test hypothesis around whether newcomers ramp up faster when interacting with projects with good documentation. Another interesting hypothesis to test would be whether code contributions without documentation are more or less likely to contain bugs.
How to tell if someone is writing documentation
We could look at the file extension and see if it's .md, .txt, .html, etc. Some people may be documenting things with Jupyter Notebooks or literate Haskell, and I'm not sure how to handle that. If we could figure out what language file we're reading from, we could parse the number of comment lines added or deleted.
Getting data out of github
Unfortunately, github pull request json file doesn't list which files are touched by the pull request. ghscraper.py would need to be modified to pull down more information.
It would be good to acknowledge those contributors who write documentation as part of a pull request.
What could we use the data for?
We could use that information to see what the ramp-up time is for newcomers to start writing documentation. It would also be useful to see what percentage of core contributors write documentation. The data could be used to test hypothesis around whether newcomers ramp up faster when interacting with projects with good documentation. Another interesting hypothesis to test would be whether code contributions without documentation are more or less likely to contain bugs.
How to tell if someone is writing documentation
We could look at the file extension and see if it's .md, .txt, .html, etc. Some people may be documenting things with Jupyter Notebooks or literate Haskell, and I'm not sure how to handle that. If we could figure out what language file we're reading from, we could parse the number of comment lines added or deleted.
Getting data out of github
Unfortunately, github pull request json file doesn't list which files are touched by the pull request.
ghscraper.py
would need to be modified to pull down more information.