Idea for tuning weights

dlorenc commented 3 years ago

The more I think about this problem, the more it looks and feels like an ML problem. We have an input data set (the universe of metrics and data we can find about a project). And we want a numerical score as a result.

If we had an objective, right answer for the ranking, it would be pretty easy (famous last words) to train a model against. But we don't have an objective ranking - if we did, we wouldn't need this project.

I don't really know anything about ML so apologies if I'm saying something stupid or wrong, but what if we tried something like this:

start with a baseline list of critical projects
run surveys to compare two against each other. Users don't have to score anything 0-1, they just tell us if package A is more critical than package B
average all of these out, and we have the start of an objective score we can use to train algorithms against

We can then compare the current list to this training set and measure how "wrong" it is. For example, if the community we survey overwhelmingly thinks package A is more critical than B, and we get that wrong, that's not good.

If we do formalize this as an ML model, it'll get easier to add more signals over time and measure how our accuracy improves!

I'm glossing over a ton of details - how to build a large enough training set to get statistically significant results, how to get, prep and clean the input data, etc.

dlorenc commented 3 years ago

Here's a reference on building the initial ranking based on the pairwise comparisons: https://arxiv.org/abs/1801.01253#:~:text=A%20common%20problem%20in%20machine,the%20most%20prominent%20special%20case.

mboehme commented 3 years ago

What about PageRank :)?

You consider all sorts of dependencies across all tracked OSS as a network and measure the importance of an OSS S as the probability that starting from a random OSS and following its dependencies transitively, you will end up in S.

dlorenc commented 3 years ago

Yes! Having an accurate dependency graph to use would definitely help a lot :)

I'm not sure of easy ways to access one across projects, though. Any suggestions? Even a single level, # of direct dependents metric would help.

I don't think this is a universal solution though (at least without changing the graph) - things like nginx or haproxy wouldn't show up in a "code level" dependency graph for a Ruby application, yet they're usually deployed together.

mboehme commented 3 years ago

Maybe Github folks can help building such a cross-project dependency graph? I agree, you'll have to consider different types of dependencies.

bmarwell commented 3 years ago

+1

mboehme commented 3 years ago

For learning-to-rank more generally, you might run into a curation problem. The data would "expire" and the exercise needs to be repeated. For reasonable results, you might need a lot of pairs from experts. There is probably also a lot of noise because some people might see some projects as more critical than others based on whim. Unknown but critical projects may get ranked much lower simply because they are not that popular. The result is a subjective ranking. Another challenge is that it could be easily gamed, particularly if the score is used for decision making / resource allocation.

mboehme commented 3 years ago

On the other hand, I think there is probably not much gaming possible simply by adjusting the weights, and the linear model is coarse enough that noise or how recent your data is might not have a huge effect.

mboehme commented 3 years ago

Yes! Having an accurate dependency graph to use would definitely help a lot :)

I'm not sure of easy ways to access one across projects, though. Any suggestions? Even a single level, # of direct dependents metric would help.

Tracking in #31.

josephjacks commented 3 years ago

What about PageRank :)?

You consider all sorts of dependencies across all tracked OSS as a network and measure the importance of an OSS S as the probability that starting from a random OSS and following its dependencies transitively, you will end up in S.

💥 DevRank exists: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-174.pdf

ossf / criticality_score

Idea for tuning weights #26