Closed scovetta closed 3 years ago
If it's a reference to an external review, I think the data should the data about the external review. So "author" should be the author of the external review, date should be the publication date of the external review, etc. I would expect there to be at most one entry for each different external review.
Maybe there should be "external-review: true|false" set to true if the entry is simply a short summary of an external review.
Perhaps we clarify this by a more verbose set of submission criteria, here are some things we may wish to consider:
Makes sense -- how about something like:
<!--
publication-state: [draft | published | recalled]
access: [public | private/<ORG>] # Everything in this repo must be "public"
is-third-party-submission: [Yes | No]
reviewer-name: [name <email> | <ORG>]
reviewer-compensation: [Yes (by project) | Yes (by external entity) | No]
reviewer-associated-with-project: [Yes | No]
domain: [security]
methodology-summary: [any of static-analysis;code-review;web-search]
opinion: [secure | insecure | partially-secure]
package-urls:
- "<PACKAGE URL>"
- "<PACKAGE URL>"
publication-date: <YYYY-MM-DD>
scope: [implementation/full | implementation/partial | non-implementation]
schema-version: 1.0
severity: [critical | important | moderate | low | defense-in-depth | informational | not-applicable]
SPDX-License-Identifier: CC-BY-4.0
-->
A few comments...
Yes/No should be true/false. Many formats (e.g., JSON) support boolean values specially.
Should "name" be "names"? Many reviews have multiple people.
The "opinion" field is probably hopeless; those are too few categories. Besides, the version reviewed is probably not the current version, and "is it secure" in many cases depends on how it's used.
This looks like YAML. Perhaps this should explicitly be stated as being YAML?
First two, yep, makes sense to me.
Opinion -- I think we need to have this -- I struggle with the right word, but essentially, "I have five seconds of attention, tell me if X is safe for me to use." -- I think we need to have a few easy-to-understand buckets, which is why I thought, "secure" (meaning, in any reasonable circumstance), "insecure" (meaning, you really shouldn't use it at all without digging in deep) and that middle group, where context is important ("You can use X, but don't expose it to the Internet, or you can use Y, except you need to do Z before you pass in data.")
Yes, it was intended to be YAML -- the validator requires it to be YAML -- We can make it something different, but I was looking for something that would be readable but structured. I suppose JSON would be fine too. And since the rest of the review will be Markdown (because it's arbitrary-ish text), the whole file really needs two sections.
<!--
publication-state: [draft | published | recalled]
access: [public | private/<ORG>] # Everything in this repo must be "public"
is-third-party-submission: [true | false]
reviewer-name: [name <email> | <ORG>]
reviewer-compensation: [true | false]
reviewer-compensation-source: [project | external-entity | n/a]
reviewer-associated-with-project: [true | false]
domain: [security]
methodology-summary: [any of static-analysis;code-review;web-search]
opinion: [secure | insecure | partially-secure]
package-urls:
- "<PACKAGE URL>"
- "<PACKAGE URL>"
publication-date: <YYYY-MM-DD>
scope: [implementation/full | implementation/partial | non-implementation]
schema-version: 1.0
severity: [critical | important | moderate | low | defense-in-depth | informational | not-applicable]
SPDX-License-Identifier: CC-BY-4.0
-->
I think we need to keep this simple though, this might be getting too complex already...
On the opinion piece, I'd recommend against the "secure" "partially secure" "not secure" discretization even though I know exactly why you'd want it and how valuable it could be in theory. Everything will be "partially secure" (or "not secure" for the purists among us) and I worry about the type of behaviour it incentivizes in the industry. Lower-end security vendors like to sell testing with reports that say something "passed" - but the truth is, you don't "pass" a pentest; you have results and remediations of findings, and every pentest/security review has findings ("all computers are broken" etc). We don't want to give people a false sense of security though ever translating these reports into a pass/fail. It might instead make sense to ask number of high/critical vulnerabilities identified in the code, or whether all medium/high/crits have been patched, for example, or to not interpret the results but rather simply collect existing reports.
*through
Hmm, I'll push back a bit here -- though a big part of me agrees with you 100%.
I think consumers need to know whether they need to worry when they use a particular package. Practically 0% of folks will read a full report to get the entire context / grok the nuance / etc. -- so the wisdom needs to be distilled down to something binary, or close to it. Maybe we can be clear on what "secure" means (i.e. "'secure' doesn't mean you can put it in an artificial heart or in a space shuttle"). But if everything comes back "not secure" then it'd be similar to "isthispackagesecure.com" -> NO :-)
The crowdsourcing aspect of this project means that if I do a poor job at my review and get something wrong, and you come along and do a better one, there'll be two reviews, and consumers would see both opinions. And the PR review process would establish some oversight into what goes in.
"As a security person, the world is full of nuance and everything is perpetually on fire. As an engineer, I just need a calendar widget working before I leave tonight."
My team has completed over 1,000 reviews, and many are basically, "Yep, we looked at this. It's a 40 line JavaScript module, no findings, nothing really interesting." I think knowing that someone looked at a package and didn't find anything, provides some value. If someone wants to create the artificial heart or space shuttle, they should be hiring a company to do this work with a formal guarantees.
Or did you mean this only when applying to referencing external reviews?
The nuance is in the wording. I agree with @jennifernick that saying something is "secure" is just asking for pain...
In fact I doubt you will ever see any mature reviewer willing to offer something called "opinion" of "secure". There is a big difference between "Did not see anything of note" and "Secure"
I would suggest instead that the "severity" field may already communicate what you want to accomplish here if we add another field of "no-findings"
How about "severe-issues-found: true|false"? It's hard to assert "this is secure"; it's easier to say whether or not something was found. I'd point to CVSS for a definition of severe (it's not perfect, but nothing is).
I can get behind this. Let me consolidate the comments and come back with a proposed set of metadata.
(And I really appreciate everyone's thoughts on this!)
I think we might need a middle ground between "severe-issues-found: true" and "severe-issues-found: false" -- Here's the reasoning -- Consumers will likely take "false" as shorthand for "probably safe" even though it's more complicated -- and will take "true" and "don't ever use and rip out of your environment".
Imagine a package like Redis -- and let's assume the only flaw was the lack of built-in authentication. The author's position is that its the integrator's job to manage access control to Redis (firewalls and stuff) and as as result, the lack of auth isn't a vulnerability. If that were the only issue mentioned in the review, what would value would "severe-issues-found" take?
Thoughts? (Ignore the json vs. yaml thing -- we can choose the right one after we iron out the content :-))
{
"Publication-State": "active",
"Reviewers": [
{
"Name": "Michael Scovetta"
"Email": "michael.scovetta@microsoft.com",
"Organization": "Microsoft",
"Associated-With-Project": [true | false],
"Compensation-Source": ["project" | "non-project" | "external" | "none"]
}
],
"Domain": "security",
"Methodology": [
"Static Analysis",
"Code Review",
"Threat Model",
"Web Search",
"External Review"
],
(option #1)
"Severe Issues Found": [true | false],
(option #2)
"Issued-Found": [
"Severe",
"Non-Severe",
"None"
]
"Package-URLs": [
"<PACKAGE URL>"
],
"Date-Reviewed": "<YYYY-MM-DD>",
"Scope": "[implementation/full | implementation/partial | non-implementation]"
"Schema-Version": "1.0",
"SPDX-License-Identifier": "CC-BY-4.0"
}
I like option 2 but I'd also add a "not applicable" or "not evaluated". Someone submitting a threat model may not be even trying to attest to other security issues.
@JasonKeirstead I was thinking about "Issue" == "Finding", etc., so if I only did a threat model and found something, but not severe, it would be Methodology = Threat Model
and Issues-Found = Non-Severe
. I think we need additional options in Methodology == e.g. fuzzing, but don't want to go overboard -- the ### Methodology
section can be used to explain exactly what was and wasn't covered in the assessment.
Updated the template and readme based on this discussion. If there are no objections, I'll update the validator and quickstart page to reflect the new schema.
Quickstart has been updated, validator is going into #19 - I think we're good to close on this.
When someone posts a review that is actually just a reference to an externally-conducted review, how should certain fields be filled out so to not be misleading:
We don't want to duplicate/copy content, but also don't want to be misleading, especially in terms of who did the actual work.