ossf / scorecard

OpenSSF Scorecard - Security health metrics for Open Source
https://scorecard.dev
Apache License 2.0
4.63k stars 503 forks source link

BUG: Scoring Dependecy-Update-Tool when there isn't file-based evidence #2845

Open spencerschrock opened 1 year ago

spencerschrock commented 1 year ago

While looking at the Dependency-Update-Tool check and it's search quota usage, I came across something with the check.

When a project doesn't have a config file (dependabot.yml, RENOVATE.md etc), the Dependency-Update-Tool (DUT) check currently uses the SearchCommits functionality of the RepoClient implementations to look for dependabot commits: https://github.com/ossf/scorecard/blob/964bbd9dcb950292bdf3dfd6466764585c391542/checks/raw/dependency_update_tool.go#L43

The problem with this is it looks back infinitely far. Take https://github.com/slsa-framework/slsa-github-generator for example.

The GitHub version of SearchCommits makes this API call:

curl -L   -H "Accept: application/vnd.github+json"  \
 -H "Authorization: Bearer $GITHUB_AUTH_TOKEN" \
 -H "X-GitHub-Api-Version: 2022-11-28"   \
 "https://api.github.com/search/commits?q=repo:slsa-framework%2Fslsa-github-generator+author:dependabot%5Dbot%5D+bump&per_page=100"

This will return results, even though slsa-framework/slsa-github-generator stopped using dependabot in June 2022. They switched to Renovate, so they would have gotten a score of 10 anyway. But it highlights a problem which applies to other repos: projects are getting 10/10 for DUT for using dependabot even though they haven't used dependabot in years.

Examples

I took a look at few samples which currently have DUT scores of 10:

https://github.com/netty/netty-tcnative

https://github.com/microsoft/TypeScript

https://github.com/emscripten-core/emscripten

Refined search approach

The DUT check could add an option to their API call to sort by author-date or committer-date and only look at commits in the last X days/months. This would be minimal changes, but solve the "this repo used dependabot once 5 years ago" problem.

Alternative approach

One alternative is to look through recent commits, but that solution isn't perfect either, especially for busy projects or projects with few dependencies. Take https://github.com/grpc/grpc, which uses dependabot, but doesn't have any dependabot updates in the last 30 commits. Since they don't specify a .github/dependabot.yml file, switching to the recent commit strategy would return a score of 0 under the new strategy. The approach in https://github.com/ossf/scorecard/issues/2016#issuecomment-1468381435 would potentially solve these issues.

One benefit of this approach is a reduction in Search API quota, which is a blocker for re-enabling DUT in the cron.

Related:

2165

https://github.com/ossf/scorecard/issues/2016#issuecomment-1468381435

spencerschrock commented 1 year ago

The quickest solution would be the refined search approach, and would get the false positives down at least. Does anyone have a strong opinion on how far back the limit should be? 3 months?

pnacht commented 1 year ago

Unfortunately there's also the issue of projects having dependabot set up only for security updates (the situation where they don't need a dependabot.yml), but their dependencies simply haven't had any vulnerabilities in the recent past...

With too small a window, Scorecard might eventually accidentally penalize a project for having healthy dependencies! "You haven't had a dangerous dependency in too long! 0/10 in Dependency-Update-Tool for you!"

spencerschrock commented 1 year ago

Would adding a dependabot.yml to the repo be an acceptable fix to that false negative? Ignoring the .gitattributes issue

pnacht commented 1 year ago

It would solve the false negative, but that file is only required when the project accepts ordinary version bumps. It becomes a matter of whether Scorecard believes that is important or if the only important aspect is ensuring the project gets security updates.

I personally lean in favor of requiring the dependabot.yml, but I'd also understand if that'd seem too overzealous.

david-a-wheeler commented 1 year ago

The purpose is to encourage the use of automation to detect and rapidly automate vulnerable dependencies.

From that point of view, I think either of these would constitute as evidence:

Either one provides evidence that it's enabled, so I think either should work.

I did a quick look to see if there was academic research. I did find this paper, which justifies Scorecard using automated dependency checking as a measurement but doesn't answer our specific questions:

"On the Use of Dependabot Security Pull Requests" by Mahmoud Alfadel, Diego Elias Costa, Emad Shihab, Mouafak Mkhallalati, 2021, https://ieeexplore.ieee.org/document/9463148

Abstract: Vulnerable dependencies are a major problem in modern software development. As software projects depend on multiple external dependencies, developers struggle to constantly track and check for corresponding security vulnerabilities that affect their project dependencies. To help mitigate this issue, Dependabot has been created, a bot that issues pull-requests to automatically update vulnerable dependencies. However, little is known about the degree to which developers adopt Dependabot to help them update vulnerable dependencies. In this paper, we investigate 2,904 JavaScript open-source GitHub projects that subscribed to Dependabot. Our results show that the vast majority (65.42%) of the created securityrelated pull-requests are accepted, often merged within a day. Through manual analysis, we identify 7 main reasons for Dependabot security pull-requests not being merged, mostly related to concurrent modifications of the affected dependencies rather than Dependabot failures. Interestingly, only 3.2% of the manually examined pull-requests suffered from build breakages. Finally, we model the time it takes to merge a Dependabot security pull-request using characteristics from projects, the fixed vulnerabilities and issued pull requests. Our model reveals 5 significant features to explain merge times, e.g., projects with relevant experience with Dependabot security pull-requests are most likely associated with rapid merges. Surprisingly, the severity of the dependency vulnerability and the potential risk of breaking changes are not strongly associated with the merge time. To the best of our knowledge, this study is the first to evaluate how developers receive Dependabot’s security contributions. Our findings indicate that Dependabot provides an effective platform for increasing awareness of dependency vulnerabilities and helps developers mitigate vulnerability threats in JavaScript projects.

pnacht commented 11 months ago

I just found my way back to this issue, and it's worth noting all the examples in the OP never had a dependabot.yml file. All of their Dependabot PRs were security updates:

If we could be sure that a project has never had a dependabot.yml file, then we'd be certain that the Dependabot PRs are for security releases. And if a project has ever enabled security releases, it seems fair to assume they'd rarely/never disable it.

It's unfortunately not trivial to determine if a project has deleted their dependabot.yml, though, so we can't be sure if those old Dependabot PRs are security updates or if they were created by a now-deleted dependabot.yml.

I've changed my mind over time and I'm now partial to the status quo: I'd rather ensure the projects that enable security updates get rewarded (even if that gives points to "undeserving" projects that have removed their DUT) than "harm" those projects as collateral damage to ensure projects that deleted their DUT get a 0/10. Especially since we don't know how big each population is*.


* Something that might be easier to determine with probes: if we have hasDependabotYmlFile and hasDependabotPR probes (or similar) and their results get stored in the BigQuery database, we can detect and evaluate such cases.