Closed aaime closed 3 years ago
Will be re-generating the list. We were looking at top 1000 repos by stars and then sorting the data by criticality_score, we need to run this on a larger set so that Geotools show up correctly. Github api has some rate limit, so we will try our best to regenerate this in next few days.
Github api has some rate limit
Does this help: https://ghtorrent.org/ ?
Didnt know that aflgo@ , that looks like a gold mine!
Happy to help! 👍
GHTorrent is an effort to create a scalable, queriable, offline mirror of data offered through the Github REST API.
An effort by Georgios and team at TU Delft.
(switched accounts)
Happy to help! 👍
GHTorrent is an effort to create a scalable, queriable, offline mirror of data offered through the Github REST API.
(switched accounts)
@mboehme - ideally we want to first sort first XK repos (e.g. 100K or even like 10K) by stars, then run criticality_score on them, sort them and publish like top 1000 (or even all), that would be ideal. if you can help connect with ghtorrent folks who can run this sort of workload and generate this for the top5-10 languages, that would be very useful. it will solve issues like https://github.com/ossf/criticality_score/issues/20 as well.
Will do. Getting back when I have something.
Tracking this in https://github.com/ossf/criticality_score/issues/33, even github search api has limitation, so will need to explore others like GHTorrent.
This is now fixed in http://commondatastorage.googleapis.com/ossf-criticality-score/java_top_200.csv
Thanks! GeoTools is not using GitHub for issue tracking, but Jira. Is the score considering it? Would it improve it we switched (not an easy thing to do by any measure mind, just wondering).
Thanks! GeoTools is not using GitHub for issue tracking, but Jira. Is the score considering it? Would it improve it we switched (not an easy thing to do by any measure mind, just wondering).
Right now, it is not, but if you switch, it will definitely fix the score for it. We are looking at ways to check custom issue trackers, but it does not seem trivial, free feel to brainstorm with us in issue #21
Btw, I did run GeoServer too, has a higher score than GeoTools but does not show up:
name: geoserver
url:
https://github.com/geoserver/geoserver
language: Java
created_since: 111
updated_since: 0
contributor_count: 339
org_count: 7
commit_frequency: 10.0
recent_releases_count: 16
closed_issues_count: 154
updated_issues_count: 167
comment_frequency: 0.8
dependents_count: 5829
criticality_score: 0.72083
Any idea why?
Btw, I did run GeoServer too, has a higher score than GeoTools but does not show up:
name: geoserver url: https://github.com/geoserver/geoserver language: Java created_since: 111 updated_since: 0 contributor_count: 339 org_count: 7 commit_frequency: 10.0 recent_releases_count: 16 closed_issues_count: 154 updated_issues_count: 167 comment_frequency: 0.8 dependents_count: 5829 criticality_score: 0.72083
Any idea why?
All the language lists are getting regenerated now due to bug fixes [https://github.com/ossf/criticality_score/commit/fc1e96657c83fc64d5c4c306f185ff133bb00460], so please check back in another week (post holidays).
I looked at the top 200 Java projects, out of curiosity, to see if any of the projects I'm working on, like GeoTools, is included in the list. It was not, which is not an issue per se, but then I've computed the criticality score from command line, getting this:
The score alone would place the project at around position 100 of the top 200 projects. Since it's a no show, I'm wondering if there is any other criteria used to include/exclude projects, besides the pure score?