ossf / criticality_score

Gives criticality score for an open source project
Apache License 2.0
1.33k stars 119 forks source link

GeoTools not showing in top 200 for java projects, run criticality score on larger sample set #15

Closed aaime closed 3 years ago

aaime commented 3 years ago

I looked at the top 200 Java projects, out of curiosity, to see if any of the projects I'm working on, like GeoTools, is included in the list. It was not, which is not an issue per se, but then I've computed the criticality score from command line, getting this:

criticality_score --repo "https://github.com/geotools/geotools"
name: geotools
url: https://github.com/geotools/geotools
language: Java
created_since: 111
updated_since: 0
contributor_count: 315
org_count: 6
commit_frequency: 9.7
recent_releases_count: 16
closed_issues_count: 150
updated_issues_count: 161
comment_frequency: 1.0
dependents_count: 337
criticality_score: 0.66477

The score alone would place the project at around position 100 of the top 200 projects. Since it's a no show, I'm wondering if there is any other criteria used to include/exclude projects, besides the pure score?

inferno-chromium commented 3 years ago

Will be re-generating the list. We were looking at top 1000 repos by stars and then sorting the data by criticality_score, we need to run this on a larger set so that Geotools show up correctly. Github api has some rate limit, so we will try our best to regenerate this in next few days.

aflgo commented 3 years ago

Github api has some rate limit

Does this help: https://ghtorrent.org/ ?

inferno-chromium commented 3 years ago

Didnt know that aflgo@ , that looks like a gold mine!

mboehme commented 3 years ago

Happy to help! 👍

GHTorrent is an effort to create a scalable, queriable, offline mirror of data offered through the Github REST API.

An effort by Georgios and team at TU Delft.

(switched accounts)

inferno-chromium commented 3 years ago

Happy to help! 👍

GHTorrent is an effort to create a scalable, queriable, offline mirror of data offered through the Github REST API.

(switched accounts)

@mboehme - ideally we want to first sort first XK repos (e.g. 100K or even like 10K) by stars, then run criticality_score on them, sort them and publish like top 1000 (or even all), that would be ideal. if you can help connect with ghtorrent folks who can run this sort of workload and generate this for the top5-10 languages, that would be very useful. it will solve issues like https://github.com/ossf/criticality_score/issues/20 as well.

mboehme commented 3 years ago

Will do. Getting back when I have something.

inferno-chromium commented 3 years ago

Tracking this in https://github.com/ossf/criticality_score/issues/33, even github search api has limitation, so will need to explore others like GHTorrent.

inferno-chromium commented 3 years ago

This is now fixed in http://commondatastorage.googleapis.com/ossf-criticality-score/java_top_200.csv

aaime commented 3 years ago

Thanks! GeoTools is not using GitHub for issue tracking, but Jira. Is the score considering it? Would it improve it we switched (not an easy thing to do by any measure mind, just wondering).

inferno-chromium commented 3 years ago

Thanks! GeoTools is not using GitHub for issue tracking, but Jira. Is the score considering it? Would it improve it we switched (not an easy thing to do by any measure mind, just wondering).

Right now, it is not, but if you switch, it will definitely fix the score for it. We are looking at ways to check custom issue trackers, but it does not seem trivial, free feel to brainstorm with us in issue #21

aaime commented 3 years ago

Btw, I did run GeoServer too, has a higher score than GeoTools but does not show up:

name: geoserver
url: 
https://github.com/geoserver/geoserver

language: Java
created_since: 111
updated_since: 0
contributor_count: 339
org_count: 7
commit_frequency: 10.0
recent_releases_count: 16
closed_issues_count: 154
updated_issues_count: 167
comment_frequency: 0.8
dependents_count: 5829
criticality_score: 0.72083

Any idea why?

inferno-chromium commented 3 years ago

Btw, I did run GeoServer too, has a higher score than GeoTools but does not show up:

name: geoserver
url: 
https://github.com/geoserver/geoserver

language: Java
created_since: 111
updated_since: 0
contributor_count: 339
org_count: 7
commit_frequency: 10.0
recent_releases_count: 16
closed_issues_count: 154
updated_issues_count: 167
comment_frequency: 0.8
dependents_count: 5829
criticality_score: 0.72083

Any idea why?

All the language lists are getting regenerated now due to bug fixes [https://github.com/ossf/criticality_score/commit/fc1e96657c83fc64d5c4c306f185ff133bb00460], so please check back in another week (post holidays).