Feature request: Add new check for developer education #3534

Open david-a-wheeler opened 1 year ago

Is your feature request related to a problem? Please describe.

It is widely accepted that developers who know how to develop secure software are more likely to develop secure software. However, most developers today are never taught how to develop secure software, including in universities, colleges, bootcamps, and self-study books. Such training materials exist, but there are insufficient incentives to learn from them.

Describe the solution you'd like

Please add a check to determine if any/all maintainers of a project have learned how to develop secure software. We suggest starting by looking for the certificate of completion for the Secure Software Development Fundamentals Courses; the certificate is available via Credly. In the longer term, we might be able to use or ask LF to modify LFX to allow a query on a GitHub or GitLab id (as well as an email address) & determine if they have a credential.

For more information on how to do this, we suggest talking with Tim Serewicz who can be contacted at tserewicz at "linuxfoundation" dot org. Please also keep me in the loop, for email it's dwheeler at "linuxfoundation" dot org.

Describe alternatives you've considered

Eventually we should probably add additional ones. There are specific focused courses or credentials that focus on specific areas, e.g., developing a secure container. We should also investigate what ISC2, SANS, and others are doing.

Additional context

This issue is being provided by the EDU SIG, which is part of the OpenSSF Best Practices WG. This issue was strongly endorsed during the Washington, DC "Secure Open Source Summit" (SOSS) meeting in September 12-13, 2023, between representatives of the OpenSSF and the US Executive branch.

David and I will try and attend the next Scorecard meeting to talk through this request.

For context on the SOSS event - https://lnkd.in/etmFeYiX
https://openssf.org/press-release/2023/09/13/openssf-gathers-us-government-and-industry-leaders-at-secure-open-source-software-summit-2023/

developers who know how to develop secure software are more likely to develop secure software

Shouldn't scorecard be able to detect that by looking at projects they actually develop and practices they actually follow instead of relying on "credentials"? If it should rely on credentials for some reason why was it decided that taking that pretty basic course should be convincing enough?

Such training materials exist, but there are insufficient incentives to learn from them

I wonder how exactly this scorecard check can change that?

@evverx asked:

Shouldn't scorecard be able to detect that by looking at projects they actually develop and practices they actually follow instead of relying on "credentials"?

No. For example, how would a tool always correctly determine that a program implements "least privilege"? If a Rust program uses unsafe constructs, is that automatically disqualifying? Automated program analysis tools can only go so far; they are notorious for false positives and false negatives. In practice, you need humans to know how to create secure software if you want secure software. The point of this score would be to see if humans know how to create secure software.

It's true that humans sometimes know things & fail to apply them. So if we can, we also should measure whether or not they've done them. But many things we want to measure can't be accurately measured in an automated way. Seeing if the humans know things is an excellent mechanism to help counter that problem.

Scorecard uses multiple measures, not any one measure, to provide insight into the security of software.

If it should rely on credentials for some reason why was it decided that taking that pretty basic course should be convincing enough?

That "pretty basic course" covers more information than most developers know. Colleges, universities, and bootcamps generally don't cover it at all. I talked with a high-level US government person whose daughter is wrapping up a Computer Science degree, and she has never been taught anything about developing secure software. I hear the same story everywhere.

I agree with you that knowing more should produce a higher score. So in the long term it'd be better to have different levels, with full mastery producing high scores. However, knowing how to develop secure software is so rare that knowing something is noteworthy. In the kingdom of the blind, the one-eyed man is king.

In addition, almost all vulnerabilities are the same boring kinds. Memory safety issues (in memory-unsafe languages), SQL injection, OS command line injection, XSS, and similar well-known vulnerabilities account for 95% to 99% of all vulnerabilities (last I checked). If you know what they are and how to avoid them, the software you produce is FAR less likely to have those problems. I don't think that statement is seriously debated. The reason we teach people how to do things is so that they will go do them correctly.

Such training materials exist, but there are insufficient incentives to learn from them

I wonder how exactly this scorecard check can change that?

Scorecard measurements provide developers incentives to improve their score. If learning how to develop secure software will increase the score of a project, many developers will seriously think about taking such a course. Many developers like learning, and they want to stay current. Giving developers an incentive to learn the material means they're more likely to do so.

I hope that helps!

In practice, you need humans to know how to create secure software if you want secure software.

Agreed.

The point of this score would be to see if humans know how to create secure software

It can only be verified by actually auditing projects.

But many things we want to measure can't be accurately measured in an automated way

I'm not sure they should be included in scorecard then.

If learning how to develop secure software will increase the score of a project, many developers will seriously think about taking such a course

If the goal of scorecard is to get people to start learning things I think it makes sense but if it's used to compare projects using their scores I'm not sure projects should be downgraded just because they don't have credentials/badges or any other attributes that don't necessarily mean anything in terms of how secure projects actually are.

My initial opinion is aligned with @evverx's above reply, and I'd love to link to the course(s) in documentation / starter guides.

Educating developers is great, but I'm not sure adding it to the score as an incentive will move the needle. For example, with CII-Best-Practices badge, we have 1,227,058 repos in our data set. 1,502 (0.1%) of which have signed up.

But many things we want to measure can't be accurately measured in an automated way I'm not sure they should be included in scorecard then.

That's my point. You cannot measure many things in an automated way, so to accurately estimate the security of OSS in an automated way, you need to combine various indirect measures. Determining the knowledge of developers is a reasonable indirect measure.

If the goal of scorecard is to get people to start learning things I think it makes sense but if it's used to compare projects using their scores I'm not sure projects should be downgraded just because they don't have credentials/badges or any other attributes that don't necessarily mean anything in terms of how secure projects actually are.

That's a reasonable concern. Certificates/badges of people are the obvious way to determine if someone knows how to develop secure software, which is why I suggested it. Perhaps we could supplement that with other measures of people that would suggest they know how to develop secure software. I'm open to ideas on that front.

But today a lot of people do not know how to write secure software; as I often say, "we get better software than we deserve". If we can incentivize people to learn, they're more likely to learn. I think it is reasonable to estimate the likely security of a project based on the knowledge of its developers.

You cannot measure many things in an automated way

Can't argue with that but as far as I can see scorecard is supposed to be "an automated tool that assesses a number of important heuristics ("checks") associated with software security and assigns each check a score of 0-10".

to accurately estimate the security of OSS in an automated way, you need to combine various indirect measures

Can't argue with that either but I think those measures shouldn't be so indirect that it's hard to tell whether they are relevant. For example I think specific checks checking whether projects can be pwned by opening issues certainly belong to scorecard (even though they aren't 100% accurate either) but "developer education" is just too abstract.

Certificates/badges of people are the obvious way to determine if someone knows how to develop secure software, which is why I suggested it

I don't think it's obvious. Projects can have all the badges in the world and be backed by all the companies with their mandatory security trainings and still be destroyed by, say, bogus router advertisements. Personally I don't think credentials should matter at all.

Can't argue with that but as far as I can see scorecard is supposed to be "an automated tool that assesses a number of important heuristics ("checks") associated with software security and assigns each check a score of 0-10".

Right. And knowing what to do is an extremely useful heuristic for estimating software security. If someone doesn't know how to do it, they probably won't do it.

... I think those measures shouldn't be so indirect that it's hard to tell whether they are relevant. ... "developer education" is just too abstract.

I think there's a misunderstanding. Please let me try again.

This proposal is not for some generic developer education. It has nothing to do with generic developer education. The proposal is to find evidence that the maintainers have had "education specifically focused on developing secure software". That's extremely relevant and not abstract at all. If someone doesn't know what XSS is (and how to counter it), SQL injection is (and how to counter it), and so on, they're pretty much certain to do it wrongly (like everyone else).

Personally I don't think credentials should matter at all.

Technically it's not the credentials that matter, it is the knowledge. The credentials are simply how we can measure things, just like anything else. Are you arguing that knowledge is irrelevant? I don't agree.

Are you arguing that knowledge is irrelevant?

No, I'm not. I think knowledge is important but I don't think it should be conflated with certificates. It should not be assumed that projects with no certificates are inherently worse either. What I'm trying to say is that when projects are assessed it shouldn't matter whether they have credentials or not.

I don't feel that we would need a numeric score and alter the projects final reporting, but I do feel strongly that highlighting that a project has a trained member participating is a very good signal to provide downstream to project consumers so they add that into their risk assessment around using a project. I think that if a credential or cert existed, we could add a checkmark or a plus sign or some other indication to show that in addition to their technically-observable practices that some of the developers also has proven expertise in some secure coding-related coursework/credential.

I think that if a credential or cert existed, we could add a checkmark or a plus sign or some other indication to show that in addition to their technically-observable practices that some of the developers also has proven expertise in some secure coding-related coursework/credential

I think it's a good compromise. (FWIW I'd put the best practices badge into that category too).

Speaking of the best practices badge it already includes the following points

The project MUST have at least one primary developer who knows how to design secure software. (See ‘details’ for the exact requirements.)

At least one of the project's primary developers MUST know of common kinds of errors that lead to vulnerabilities in this kind of software, as well as at least one method to counter or mitigate each of them.

and it even points to the course. I'm not sure why there should be another check if the existing check presumably already covers that. It seems the badge would be the best place to collect links to certificates/other badges/whatever.

@SecurityCRob @david-a-wheeler Apologies for the late heads up, but Scorecard alternates meeting times now so the next upcoming meeting is Monday, Oct. 16, at 10am ET rather than Thursday at the previous time. It is a smaller group, but hope that helps. Let me know if we need to schedule something more to accommodate discussion.

(it would be great if OpenSSF could utilize asynchronous means of communications like issues on Github and commit messages a bit more. Meetings are fine but details get lost along the way and the meeting notes with YouTube videos aren't exactly useful when it's necessary to figure out why certain decisions are made).

Anyway it would be nice if the gist/summary related to this particular issue was posted here.

I think that if a credential or cert existed, we could add a checkmark or a plus sign or some other indication to show that in addition to their technically-observable practices that some of the developers also has proven expertise in some secure coding-related coursework/credential

We could, but why bother? The point of Scorecard is to provide a set of 0-10 measures that accumulate into an overall score from 0-10. If something doesn't affect the score, it doesn't exist from the point-of-view of Scorecard.

technically-observable practices

I think the purpose of Scorecard should be to estimate risk countermeasures, not this narrow view.

Here's what Scorecard says it's fore:

We created Scorecard to help open source maintainers improve their security best practices and to help open source consumers judge whether their dependencies are safe.

Helping judge if dependencies is safe is in-scope, so I think "are any maintainers educated" is also in-score and should be included in the numerical measurement.

Helping judge if dependencies is safe is in-scope

It is but the problem with this check is that it doesn't help with that at all. The fact that it's based on credentials almost nobody has at this point doesn't make it any more convincing either.

should be included in the numerical measurement

Could you show at least one project whose maintainers have that particular certificate?

(It's still unclear to me why the best practices badge with those requirements included isn't enough by the way)

I took a look at the meeting notes and found this

DW: totally agree we need to keep in mind - however much of this data is public by design…

Based on my experience where some CNCF project took my email address from my commit messages and put it in a public database saying that I worked for Google somehow without my consent and knowledge I think people taking that course should be made aware that this data can be exposed publicly at scale. It isn't a theoretical concern because there are places where it isn't allowed (or at least the legal status isn't clear) to participate in activities like that so people should be able to turn "sharing" off. I get that OpenSSF targets the U.S. mostly but for better or for worse open-source software is global.

(It's still unclear to me why the best practices badge with those requirements included isn't enough by the way)

Getting this data is automatable, so it makes sense to have in Scorecard and not only in the best practices badge.

I guess it makes sense given that badges don't necessarily mean that those requirements are actually met.

Either way I think this check is purely promotional and I don't think it should affect scores.

I get that OpenSSF targets the U.S. mostly but for better or for worse open-source software is global.

OpenSSF is global. I recently came back from OpenSSF Day EU in Bilbao, Spain. In December we're having a conference in Japan. The last OpenSSF Day North America wasn't in the US, it was in Canada. Sure, a lot are in the US. That's simply because many developers are in the US.

I'd argue that security training/certification (any such) and evidence for that within the maintaining team, though it doesn't guarantee it, certainly increases the odds of the project being built with a secure mindset.

OpenSSF is global

I don't think it is. All its activities are prompted by certain executive orders affecting suppliers selling stuff to the U.S. government. It's true they can be generalized but I would say the U.S. is a priority.

I'd argue that security training/certification (any such) and evidence for that within the maintaining team, though it doesn't guarantee it, certainly increases the odds of the project being built with a secure mindset.

That's where I disagree. Projects I usually look at aren't developed by "uneducated developers". They are developed and backed by large companies with all those bells and whistles and actual security teams but as usual to ship things faster corners are cut because security is just not a priority (as usual).

OpenSSF is global

I don't think it is.

Sorry, this week we at OpenSSF are too busy preparing for OpenSSF Japan (next week) to hear your claim we're US-only :-).

All its activities are prompted by certain executive orders affecting suppliers selling stuff to the U.S. government. It's true they can be generalized but I would say the U.S. is a priority.

I'm on the OpenSSF staff. I can assure you that relatively little of the OpenSSF work is prompted by US executive orders. It's quite the reverse. We have many activities. When the US announces something, we try to make it clear how their requests are related to work we're doing.

For example, Scorecard & the Best Practices badge weren't created because the US government asked for something. Instead, projects want to know information about the security of their own project & of other projects they're thinking of adding as dependencies. Similarly, the secure software fundamentals course was created because there's a need for education, not because of a US government mandate. Most OpenSSF projects are like that; they exist because there's a general need irrespective of government requests.

In a few cases there's a focus on the US government, mainly responding to their requests for information or requests for comment. It's very much a minority of OpenSSF work.

We certainly do talk with the US government (and other governments), we want to hear what they have to say! They're important users, and we want OSS to be useful to its users. They can tell us about operational problems. If they have a good idea, we want to know about it. We want to collaborate with anyone where it makes sense to do so.

Many companies are US-centric, so there's a lot of contributors from the US. Saying it's all US doesn't really make sense. The C/C++ Compiling hardening guide we're announcing today was led by Ericsson, by people based in Sweden. Last I checked, Sweden's not in the US. Of course, that work had support from people around the world, as is appropriate.

Let's stop here. This has little to do with the feature request.

I'd argue that security training/certification (any such) and evidence for that within the maintaining team, though it doesn't guarantee it, certainly increases the odds of the project being built with a secure mindset.

That's where I disagree. Projects I usually look at aren't developed by "uneducated developers". They are developed and backed by large companies with all those bells and whistles and actual security teams but as usual to ship things faster corners are cut because security is just not a priority (as usual).

I didn't say developers have had no education. I'm saying that developers often have no education in how to develop secure software. Educational materials, training materials, and everything else tends to show developers how to do things insecurely. Let's look at some data:

53% of software developers report that their organizations don’t ensure training on securing coding [Poneman 2020]
No top 40 US “coding” or top 5 non-US CS school required secure coding in 2019 [Forrester 2019]
Of U.S. News's top 24 CS 2022 schools, only 1 requires security for undergraduates.
The third most popular answer for how to improve OSS security was providing more training to the OSS community. Source: the 2022 v2.0 survey “Addressing Cybersecurity Challenges in Open Source Software” by Stephen Hendrick (VP Research, The Linux Foundation) & Martin McKeay (Senior Editorial Research Manager, Snyk), question q0050mrv. The only higher-ranked items were “define best practices for secure software development” and “provide tools for analyzing and remediating vulnerabilities in the top 500 open source components” - which clearly don’t conflict with training.
One article pointedly noted, “universities don’t train computer science students in security”.
One survey claimed otherwise, but it is misleading. The State of Developer-Driven Security Survey, Secure Code Warrior, 2022, found that 89% of developers reported they’ve received sufficient training in secure coding skills. However, what this survey really showed is that developers know so little that they think they know more than they do (an unfortunate example of the Dunning–Kruger effect). More than half of those respondents were not familiar with common software vulnerabilities, how to avoid them, and how they can be exploited. 92% said they needed more training on security frameworks, and 86% stated they found it challenging to practice secure coding. In short, they thought they knew enough, yet most knew almost nothing.

I think that's more than enough evidence to show that developers often don't know how to develop secure software, with predictable results. The solution is to encourage this education. A clear way to encourage it is to measure it.

Let's stop here. This has little to do with the feature request.

It does because people residing in certain countries can't participate in that program and can't get that certificate even if they wanted to so they are effectively going to be downgraded based on their location.

I think that's more than enough evidence to show that developers often don't know how to develop secure software

What I'm trying to say here is that sloppy craftsmanship with a certificate is still sloppy craftsmanship. If the idea is to figure out how secure projects actually are certificates shouldn't matter. Thanks for the links.

Just to be absolutely clear I'm all for encouraging the education but it seems to me there should be better venues for that than scorecard. https://github.com/ossf/DevRel-community seems to be going in that direction.

53% of software developers report that their organizations don’t ensure training on securing coding [Poneman 2020]

I'm afraid I didn't get past their marketing form but I'm going to assume that 53% reflects reality according to that report.

No top 40 US “coding” or top 5 non-US CS school required secure coding in 2019 [Forrester 2019]

That link is dead. https://web.archive.org/web/20221209102257/https://www.securityjourney.com/post/what-we-learned-from-our-vulnerabilities-benchmark-report refers to the Ponemon Institute’s report and claims that

one common theme that stood out to us during our analysis was that the vast majority of these vulnerabilities would not have existed if the developers were properly trained in secure coding practices

to be honest I'm not sure where that conclusion came from but that page appear to point to another marketing form and try to sell security training that can help to fix all that at the end.

Of U.S. News's top 24 CS 2022 schools, only 1 requires security for undergraduates.

That was unexpected. I thought at least Columbia with https://www.cs.columbia.edu/~smb/classes/, MIT and Purdue weren't on that list.

The third most popular answer for how to improve OSS security was providing more training to the OSS community. Source: the 2022 v2.0 survey “Addressing Cybersecurity Challenges in Open Source Software” by Stephen Hendrick (VP Research, The Linux Foundation) & Martin McKeay (Senior Editorial Research Manager, Snyk), question q0050mrv.

That was an interesting read. Based on how the report is structured it seems it targeted consumers for the most part. Either way the full question was "What are some of the ways that IT Industry Organizations could improve the security of developing open source software?". I'm not sure why there was no option where organizations allocate engineering resources to participate in development, code reviews, audits and so on (I'm guessing it's expensive) but it was at least acknowledged there separately that

The use of open source software has often been a one-way street where users see significant benefit with minimal cost or investment. In order for larger open source projects to meet user expectations it will be important for organizations to give back and close the loop to improve open source software sustainability"

Either way it isn't clear why it's assumed that that training can improve anything so I'll try to take a closer look later.

One article pointedly noted, “universities don’t train computer science students in security”.

That article ends with "A large portion of vulnerabilities don’t require advanced knowledge of security to be fixed" and refers to another article where somehow education could have prevented sloppy patching practices.

I'm sorry but I can't seem to figure out how all that leads to the conclusion that the certificate is indicative of anything.

Writing secure software requires the developer know how to develop secure software. Many developers do not know how to develop secure software, because it's not taught in many schools and their co-developers don't know how to do it either. We wish to change this sad state of affairs.

Writing secure software requires the developer know how to develop secure software. Many developers do not know how to develop secure software, because it's not taught in many schools and their co-developers don't know how to do it either

Agreed.

We wish to change this sad state of affairs

Given that companies, universities and so on can't change that I'm not sure how scorecard can change that either. To judge from https://github.com/ossf/scorecard/issues/3534#issuecomment-1749479288

For example, with CII-Best-Practices badge, we have 1,227,058 repos in our data set. 1,502 (0.1%) of which have signed up

scorecard isn't particularly effective in terms of promoting the badge (even combined with the SOS rewards).

Either way I don't think I can add anything else here. I agree it would be nice to get people to start learning things but I think there are better places to do that.

For example, with CII-Best-Practices badge, we have 1,227,058 repos in our data set. 1,502 (0.1%) of which have signed up

scorecard isn't particularly effective in terms of promoting the badge (even combined with the SOS rewards).

For what it's worth, I'm realizing this number is misleading as many of these repos don't know what Scorecard is, or that they're being scanned. The number gets better if we limit the analysis to repos that have installed Scorecard Action at some point, indicating repos which care or are aware of Scorecard.

As of Nov 15, 2023, there were 11,517 repos which had published Scorecard results (so we could be missing some repos which have it installed but don't publish) . Of those, there are 380 with some form of the badge, or 3.3%.

Forks alter these numbers too. Today, GitHub Actions aren't enabled when you fork a repo with existing workflows, but at some point that wasn't true. I'm guess / estimating that at least 6000 of these scorecard-enabled repos are forks (if more than 15 repos have the same repo name, I called it a fork).

Which brings the ratio closer to 380/5500, or ~7%

Which brings the ratio closer to 380/5500, or ~7%

I think some of those repos like systemd got their badges before scorecard was created so they should probably be excluded to get the accurate number. I'm not sure how to exclude repos influenced by the SOS rewards but if the certificate is going to be included there too it shouldn't be necessary probably.

Writing secure software requires the developer know how to develop secure software

While I agree with this I guess what I'm trying to say is that it doesn't guarantee anything. For example since https://openssf.org/blog/2023/11/29/strengthening-the-fort-openssf-releases-compiler-options-hardening-guide-for-c-and-c/ was mentioned here the "-D_FORTIFY_SOURCE=3" thing can be destructive (https://github.com/systemd/systemd/commit/2cfb790391958ada34284290af1f9ab863a515c7) and replace false positives with actual segfaults (and I'm assuming that's where https://best.openssf.org/Compiler-Hardening-Guides/Compiler-Options-Hardening-Guide-for-C-and-C++#additional-considerations came from). It took several full-time engineers (including one of the co-authors of that announcement) and quite a few iterations to get past the CI with ASan/UBSan to clean up that mess. The assumption that some basic training can significantly affect anything is weird to me to be honest.

That being said thanks to https://github.com/ossf/scorecard/issues/3534#issuecomment-1836886543 I think the fact that this check can be used to measure some OpenSSF KPIs is useful in itself and while I don't think it's helpful in terms of figuring out how secure projects actually are or getting people to start learning things I'm not strongly opposed to it any longer. (I have no idea how this check can figure out whether maintainers with certificates actually participate in development though but it doesn't matter much for my purposes).

(Assigning David for investigation on feasibility)

I've done some research. There are several approaches & we can use them all:

Sometimes people include in their GitHub account a link to LinkedIn, Credly, and maybe others. We can follow those links to see what credentials they have. I suspect the same is true for GitLab. We'd need to decide what to accept, e.g., LF Class, CISSP, etc.
I've talked with our LF Chief Privacy Officer, who's contacted one of the LF external counsel. The LF has a lot of data, but we need to ensure that public queries on that are approved by the individual being queried. They'll investigate how to set up an UI so that people can choose to allow such queries on themselves if they wish. Details TBD.

Sometimes people include in their GitHub account a link to LinkedIn, Credly, and maybe others. We can follow those links to see what credentials they have. I suspect the same is true for GitLab. We'd need to decide what to accept, e.g., LF Class, CISSP, etc.

This would start to be an API/auth nightmare in my opinion. LinkedIn requires authentication for their API for example. Credly likely does too.

Credly does have a way to filter based on email (if we have it): https://www.credly.com/docs/issued_badges#get-issued-badges

I'm talking with the LF privacy officer to work out options.

This issue has been marked stale because it has been open for 60 days with no activity.

ossf / scorecard

Feature request: Add new check for developer education #3534