How to best represent external reviews

scovetta commented 3 years ago

When someone posts a review that is actually just a reference to an externally-conducted review, how should certain fields be filled out so to not be misleading:

Author: Should this be the PR author or the author of the externally-conducted review?
Methodology: Should "external-review" be a choice here?
Review Date: Should this be the date the PR author filled out the form, or or when the external review was completed?
Methodology: Should this be the methodology used to create the PR (e.g. I read the report, looks good.)?

We don't want to duplicate/copy content, but also don't want to be misleading, especially in terms of who did the actual work.

david-a-wheeler commented 3 years ago

If it's a reference to an external review, I think the data should the data about the external review. So "author" should be the author of the external review, date should be the publication date of the external review, etc. I would expect there to be at most one entry for each different external review.

Maybe there should be "external-review: true|false" set to true if the entry is simply a short summary of an external review.

jenniferfernick commented 3 years ago

Perhaps we clarify this by a more verbose set of submission criteria, here are some things we may wish to consider:

Target of review created by (company/organization name)
Review conducted by (company/organization name)
Review authors if available (names of individuals within company/organization)
Publication date of review
Is third party review? (T/F, but the fields above can help illustrate potential conflicts of interest)
Review submitted to repo by (name/handle of PR author)
Were the Reviewers paid by an entity to create this review?
If the Reviewers were paid for the review, was the Target's org the one that funded it, or some other entity (such as Open Tech Fund, OpenSSF, etc)?

scovetta commented 3 years ago

Makes sense -- how about something like:

<!--
publication-state: [draft | published | recalled]
access: [public | private/<ORG>]        # Everything in this repo must be "public"
is-third-party-submission: [Yes | No]
reviewer-name: [name <email> | <ORG>]
reviewer-compensation: [Yes (by project) | Yes (by external entity) | No]
reviewer-associated-with-project: [Yes | No]
domain: [security]
methodology-summary: [any of static-analysis;code-review;web-search]
opinion: [secure | insecure | partially-secure]
package-urls:
- "<PACKAGE URL>"
- "<PACKAGE URL>"
publication-date: <YYYY-MM-DD>
scope: [implementation/full | implementation/partial | non-implementation]
schema-version: 1.0
severity: [critical | important | moderate | low | defense-in-depth | informational | not-applicable]
SPDX-License-Identifier: CC-BY-4.0
-->

david-a-wheeler commented 3 years ago

A few comments...

Yes/No should be true/false. Many formats (e.g., JSON) support boolean values specially.

Should "name" be "names"? Many reviews have multiple people.

The "opinion" field is probably hopeless; those are too few categories. Besides, the version reviewed is probably not the current version, and "is it secure" in many cases depends on how it's used.

This looks like YAML. Perhaps this should explicitly be stated as being YAML?

scovetta commented 3 years ago

First two, yep, makes sense to me.

Opinion -- I think we need to have this -- I struggle with the right word, but essentially, "I have five seconds of attention, tell me if X is safe for me to use." -- I think we need to have a few easy-to-understand buckets, which is why I thought, "secure" (meaning, in any reasonable circumstance), "insecure" (meaning, you really shouldn't use it at all without digging in deep) and that middle group, where context is important ("You can use X, but don't expose it to the Internet, or you can use Y, except you need to do Z before you pass in data.")

Yes, it was intended to be YAML -- the validator requires it to be YAML -- We can make it something different, but I was looking for something that would be readable but structured. I suppose JSON would be fine too. And since the rest of the review will be Markdown (because it's arbitrary-ish text), the whole file really needs two sections.

<!--
publication-state: [draft | published | recalled]
access: [public | private/<ORG>]        # Everything in this repo must be "public"
is-third-party-submission: [true | false]
reviewer-name: [name <email> | <ORG>]
reviewer-compensation: [true | false]
reviewer-compensation-source: [project | external-entity | n/a]
reviewer-associated-with-project: [true | false]
domain: [security]
methodology-summary: [any of static-analysis;code-review;web-search]
opinion: [secure | insecure | partially-secure]
package-urls:
- "<PACKAGE URL>"
- "<PACKAGE URL>"
publication-date: <YYYY-MM-DD>
scope: [implementation/full | implementation/partial | non-implementation]
schema-version: 1.0
severity: [critical | important | moderate | low | defense-in-depth | informational | not-applicable]
SPDX-License-Identifier: CC-BY-4.0
-->

I think we need to keep this simple though, this might be getting too complex already...

jenniferfernick commented 3 years ago

On the opinion piece, I'd recommend against the "secure" "partially secure" "not secure" discretization even though I know exactly why you'd want it and how valuable it could be in theory. Everything will be "partially secure" (or "not secure" for the purists among us) and I worry about the type of behaviour it incentivizes in the industry. Lower-end security vendors like to sell testing with reports that say something "passed" - but the truth is, you don't "pass" a pentest; you have results and remediations of findings, and every pentest/security review has findings ("all computers are broken" etc). We don't want to give people a false sense of security though ever translating these reports into a pass/fail. It might instead make sense to ask number of high/critical vulnerabilities identified in the code, or whether all medium/high/crits have been patched, for example, or to not interpret the results but rather simply collect existing reports.

jenniferfernick commented 3 years ago

*through

scovetta commented 3 years ago

Hmm, I'll push back a bit here -- though a big part of me agrees with you 100%.

I think consumers need to know whether they need to worry when they use a particular package. Practically 0% of folks will read a full report to get the entire context / grok the nuance / etc. -- so the wisdom needs to be distilled down to something binary, or close to it. Maybe we can be clear on what "secure" means (i.e. "'secure' doesn't mean you can put it in an artificial heart or in a space shuttle"). But if everything comes back "not secure" then it'd be similar to "isthispackagesecure.com" -> NO :-)

The crowdsourcing aspect of this project means that if I do a poor job at my review and get something wrong, and you come along and do a better one, there'll be two reviews, and consumers would see both opinions. And the PR review process would establish some oversight into what goes in.

"As a security person, the world is full of nuance and everything is perpetually on fire. As an engineer, I just need a calendar widget working before I leave tonight."

My team has completed over 1,000 reviews, and many are basically, "Yep, we looked at this. It's a 40 line JavaScript module, no findings, nothing really interesting." I think knowing that someone looked at a package and didn't find anything, provides some value. If someone wants to create the artificial heart or space shuttle, they should be hiring a company to do this work with a formal guarantees.

Or did you mean this only when applying to referencing external reviews?

JasonKeirstead commented 3 years ago

The nuance is in the wording. I agree with @jennifernick that saying something is "secure" is just asking for pain...

In fact I doubt you will ever see any mature reviewer willing to offer something called "opinion" of "secure". There is a big difference between "Did not see anything of note" and "Secure"

I would suggest instead that the "severity" field may already communicate what you want to accomplish here if we add another field of "no-findings"

david-a-wheeler commented 3 years ago

How about "severe-issues-found: true|false"? It's hard to assert "this is secure"; it's easier to say whether or not something was found. I'd point to CVSS for a definition of severe (it's not perfect, but nothing is).

scovetta commented 3 years ago

I can get behind this. Let me consolidate the comments and come back with a proposed set of metadata.

(And I really appreciate everyone's thoughts on this!)

scovetta commented 3 years ago

I think we might need a middle ground between "severe-issues-found: true" and "severe-issues-found: false" -- Here's the reasoning -- Consumers will likely take "false" as shorthand for "probably safe" even though it's more complicated -- and will take "true" and "don't ever use and rip out of your environment".

Imagine a package like Redis -- and let's assume the only flaw was the lack of built-in authentication. The author's position is that its the integrator's job to manage access control to Redis (firewalls and stuff) and as as result, the lack of auth isn't a vulnerability. If that were the only issue mentioned in the review, what would value would "severe-issues-found" take?

scovetta commented 3 years ago

Thoughts? (Ignore the json vs. yaml thing -- we can choose the right one after we iron out the content :-))

{
    "Publication-State": "active",
    "Reviewers": [
        {
            "Name": "Michael Scovetta"
            "Email": "michael.scovetta@microsoft.com",
            "Organization": "Microsoft",
            "Associated-With-Project": [true | false],
            "Compensation-Source": ["project" | "non-project" | "external" | "none"]
        }
    ],
    "Domain": "security",
    "Methodology": [
        "Static Analysis",
        "Code Review",
        "Threat Model",
        "Web Search",
        "External Review"
    ],

    (option #1)
    "Severe Issues Found": [true | false],

    (option #2)
    "Issued-Found": [
        "Severe",
        "Non-Severe",
        "None"
    ]
    "Package-URLs": [
        "<PACKAGE URL>"
    ],
    "Date-Reviewed": "<YYYY-MM-DD>",
    "Scope": "[implementation/full | implementation/partial | non-implementation]"
    "Schema-Version": "1.0",
    "SPDX-License-Identifier": "CC-BY-4.0"
}

JasonKeirstead commented 3 years ago

I like option 2 but I'd also add a "not applicable" or "not evaluated". Someone submitting a threat model may not be even trying to attest to other security issues.

scovetta commented 3 years ago

@JasonKeirstead I was thinking about "Issue" == "Finding", etc., so if I only did a threat model and found something, but not severe, it would be Methodology = Threat Model and Issues-Found = Non-Severe. I think we need additional options in Methodology == e.g. fuzzing, but don't want to go overboard -- the ### Methodology section can be used to explain exactly what was and wasn't covered in the assessment.

scovetta commented 3 years ago

Updated the template and readme based on this discussion. If there are no objections, I'll update the validator and quickstart page to reflect the new schema.

scovetta commented 3 years ago

Quickstart has been updated, validator is going into #19 - I think we're good to close on this.

ossf / security-reviews

How to best represent external reviews #14