nsp data re-hosting by node foundation

sam-github commented 7 years ago

https://nodejs.org/en/blog/announcements/nodejs-security-project/

As announced above, goals are:

The Node.js Foundation will take over the following responsibilities from ^Lift:

Maintaining an entry point for ecosystem vulnerability disclosure;

Maintaining a private communication channel for vulnerabilities to be vetted;

Vetting participants in the private security disclosure group;

Facilitating ongoing research and testing of security data;

Owning and publishing the base dataset of disclosures, and

Defining a standard for the data, which tool vendors can build on top of, and security and vendors can add data and value to as well.

To be clear, the Node Security Project itself is not being donated to the foundation, just the dataset, which the foundation will maintain.

The process for this is not yet determined, I'm opening this issue to discuss it.

sam-github commented 7 years ago

The conversation should probably kick off with an example of the data that is donated, so its easier to discuss the points above. Also, my understanding that the bullet points from the foundation announcement should be considered more of a start of the conversation about what to do with the donation, not as explicit directives.

@mikeal, can you facilitate us getting a sample of the data? The donation was announced, but has it been finalized yet?

sam-github commented 7 years ago

/cc @evilpacket

evilpacket commented 7 years ago

@sam-github We're still working through paperwork but we can continue to move forward.

Here is a sample record.

    {
      "id": 176,
      "created_at": "2016-11-30T22:26:03+00:00",
      "updated_at": "2017-01-01T01:43:16+00:00",
      "title": "Downloads Resources over HTTP",
      "author": "Adam Baldwin",
      "coordinator": "^Lift Security"
      "module_name": "webrtc-native",
      "publish_date": "2017-01-01T01:43:16+00:00",
      "cves": [],
      "vulnerable_versions": "<=99.999.99999",
      "patched_versions": "<0.0.0",
      "slug": "webrtc-native_downloads-resources-over-http",
      "overview": "markdown string",
      "recommendation": "markdown string",
      "references": "markdown string",
      "cvss_vector": "CVSS:3.0/AV:N/AC:H/PR:L/UI:R/S:U/C:H/I:H/A:H",
      "cvss_score": 7.1
    }

One thing I was recommending to be added was a coordinator field to give the people or person or vendor (like lift, synk, etc) credit for doing the vulnerability coordinating so that it's at least some incentive for vendors to contribute more to the project.

cjihrig commented 7 years ago

How do we want to store the data? @evilpacket, what database are you currently storing the data in?

jasnell commented 7 years ago

One proposal is to store the vulnerabilities records as individual files within a private github repo with access limited to a very specific team. If the access needs to be more granular, then we can follow a model similar to that used in the nodejs/secrets repository in which files in individual subdirectories are encrypted and accessible only to individuals with the appropriate keys for that sub. Whether or not that approach is sufficient or not, I'm not yet sure, but it is one approach we can take.

@evilpacket ... one step that I would like to take is drafting up an actual spec for the record format, which is something I can do so long as I verify a few details. Specifically, the cvss_vector field, is there a spec reference for the format there? And I assume that cves is an array of string CVE identifiers?

cjihrig commented 7 years ago

I was under the impression that we were exposing the data so that people could build on top of it. If that's the case, we could just store a DB dump in git, and keep the secrets private.

jasnell commented 7 years ago

@cjihrig ... there are two aspects: (1) maintaining a complete database of vulnerabilities that may or may not yet have been disclosed to the public and (2) providing access to the vulnerabilities that have been disclosed to the public.

cjihrig commented 7 years ago

OK. There are a number of ways to do that. We could add fields on the disclosure status, make a DB view of disclosed vulnerabilities publicly available, and still store a dump in git in a private repo. Having each vulnerability in a separate file seems.... cluttered.

mcollina commented 7 years ago

One proposal is to store the vulnerabilities records as individual files within a private github repo with access limited to a very specific team.

@jasnell I think we can give that a shot first, see how much traffic there is, and then iterate to a new thing. I think we should also provide the data over HTTP (or maybe NPM) as a single blob to download.

@cjihrig there is the general understanding that we do not want to compete in any way with the providers. As such, we won't provide an API. I'm not necessarily in agreement from a technical perspective, but it is a key requirement that the Foundation stays neutral.

A good "in between" solution might be to use something like AWS S3, which has 1) versions, 2) a very rich ACL system, IAM 3) an API that is very widely known and 4) it is very easy to automate.

vdeturckheim commented 7 years ago

@mcollina agreed on S3. IMO it is a reliable enough platform to make data available in good conditions. People will be building business over those data.

jbergstroem commented 7 years ago

@jasnell said: One proposal is to store the vulnerabilities records as individual files within a private github repo with access limited to a very specific team. If the access needs to be more granular, then we can follow a model similar to that used in the nodejs/secrets repository in which files in individual subdirectories are encrypted and accessible only to individuals with the appropriate keys for that sub.

I like this too. We should also set up a local (as in, infra we host) git mirror.

This would support:

Per (gpg) user access to private security issues
Full-directory download of each revision
Likely one folder per vulnerability if we want to be consistent. I guess that gives us the option of storing PoC too.

SomeoneWeird commented 7 years ago

@jbergstroem Is there a reason you're suggesting it has to be our own infra?

jbergstroem commented 7 years ago

@SomeoneWeird the more backups the better. I would treat something hosted by our infra as more reliable than trusting third party.

Edit: If we host a git mirror we can additionally control access, meaning we know it won't be tainted.

evilpacket commented 7 years ago

@cjihrig I would recommend a structure like we had before we moved it into a db, but instead just use json instead of markdown, but really I have no opinions about this other than it be machine readable and have some sort of process for access control as you have to management private incoming reports and transition them to public or whatever through out the coordination and disclosure process.

directories for years.
1 vulnerability per .json file with a record

@jasnell We used the format that's in the example record because it's what the CVSS calculator outputs when you use it and there is no need to rebuild something like that

More info on the specification here. https://www.first.org/cvss/specification-document

cves look like this

  "cves": [
    "CVE-2013-7379"
  ],

joshgav commented 7 years ago

@mcollina

we do not want to compete in any way with the providers. As such, we won't provide an API.

My understanding was the opposite, that we would provide an open, documented API for tools vendors to access. For example, greenkeeper or npm might programatically access that API when installing or checking modules.

The conversation in this thread is about data storage, and exposing this data through an API is only loosely coupled to that, so I'll start another issue to discuss.

sam-github commented 7 years ago

@evilpacket how much of the nsp vuln DB is node vunls, as opposed to non-node (probably browser) vulnerabilities for packages installable from npmjs.org? Did you and @mikeal discuss how the node foundation would deal with non-node vulns?

sam-github commented 7 years ago

 "author": "Adam Baldwin",

Author of this report? Or original reporter of vuln?

 "publish_date": "2017-01-01T01:43:16+00:00",

Date the vuln was pulicized, right? As opposed to when the vulnerable code was published.

 "vulnerable_versions": "<=99.999.99999",

Do you need a seperate ~engine~ (EDIT:) "entry" if multiple non-contiguous versions are vulnerable? Eg, if 1.2.3 and ">-2.4.6 && <= 2.5.2"` are vuln, how to express that?

 "patched_versions": "<0.0.0",

What is a patched version? If a patch is applied, then it is a new version. Is this the last vulnerable version? Or the (set) of versions in which patches were first published? So if the versions I listed above were patched in the 1.x and 2.x lines, the patched_versions would be "^1.2.4" and "^2.5.3"?

I agree, a reporter(coordinator?) field would be useful.

evilpacket commented 7 years ago

@sam-github it's all node. Every package is for something that's in npm.

author (should be renamed to something like finder I think, as it's the finder of the bug)

Should add another field for coordinator imho

Publish_date: date it was published

Might want to add additional dates such as reported on, published, updated

patched_versions / vulnerable_versions: You just get really nasty semver statements like this >=2.5.0 <= 3.0.0 || >=3.1.0

I hope this helps.

cjihrig commented 7 years ago

@evilpacket is there any update on transferring the data to the Node Foundation?

evilpacket commented 7 years ago

Nope. waiting on paperwork from the foundation atm for official things afaik. We can probably make progress though in the mean time.

On Wed, Feb 8, 2017 at 6:23 AM, Colin Ihrig notifications@github.com wrote:

@evilpacket https://github.com/evilpacket is there any update on transferring the data to the Node Foundation?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/security-wg/issues/16#issuecomment-278341635, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHEOZoo4Ds5hLQqVZiVE55O9gYC_u6Wks5rac_JgaJpZM4LcDTl .

sam-github commented 7 years ago

@evilpacket

it's all node. Every package is for something that's in npm.

There are packages in npm that are for browser javascript, not node. This vuln, https://nodesecurity.io/advisories/15, is for https://github.com/rails/jquery-ujs, I don't see any connection to node.

Am I misunderstanding?

EDIT: and jquery itself, https://www.npmjs.com/package/jquery, would be example of an npm package that the node foundation would not accept vulnerability reports for, or would it?

sam-github commented 7 years ago

@evilpacket https://github.com/nodejs/security-wg/issues/16#issuecomment-274178450, some questions on the meaning of the fields

What do we need to do, specifically, to prepare for this?

I can think of:

decide how to store the data
document the meaning of the fields
decide what the node foundation vulnerability collection will be called
implement a reporting/management process for the vulnerabilities

(4) deserves an issue of its own, most likely.

Anything else?

vdeturckheim commented 7 years ago

@sam-github my understanding was that the storing of the data would go to Github.

evilpacket commented 7 years ago

@sam-github I guess I misspoke as I consider anything with an npm module related to node as it can be used anywhere. /shrug the original project intent was focused around anything that's crammed into npm, so that's the scope.

Couple thoughts on your other questions / todo items.

decide what the node foundation vulnerability collection will be called It's focused on ecosystem security so I would be so inclined to go that direction if I had to rename things.
implement a reporting/management process for the vulnerabilities

So there is a standard for vulnerability coordination and disclosure that we can probably reference for any process bits ISO/IEC 29147 and it's available for free now and isn't a horribly boring read.

sam-github commented 7 years ago

@evilpacket great, thanks for the link

evilpacket commented 7 years ago

This is a better link, that was to some article. http://standards.iso.org/ittf/PubliclyAvailableStandards/c045170_ISO_IEC_29147_2014.zip

On Tue, Feb 21, 2017 at 9:48 AM, Sam Roberts notifications@github.com wrote:

@evilpacket https://github.com/evilpacket great, thanks for the link

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/security-wg/issues/16#issuecomment-281421413, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHEORr0v8W3Gn2y_5Sv1TYfUV-ikDGfks5reyNggaJpZM4LcDTl .

drifkin commented 7 years ago

@sam-github the link you posted looks like it's from a private repo

sam-github commented 7 years ago

ah, so it is. sorry. removed it, its not useful here ATM.

sam-github commented 7 years ago

@evilpacket Paperwork is done, I believe, what's the next step? How big is the JSON, any suggestions on how you can upload it? github attachment? Commit it into the nodejs/security-wg repo? ...?

sam-github commented 7 years ago

Data dump from @evilpacket

reports.line-delimited-json.zip

Next up: decide what to do with it

sam-github commented 7 years ago

Note: @evilpacket suggests we coordinate with @nstarke on this because he (Adam) is way too busy.

@nstarke create an issue asking to join the WG if you would like to.

sam-github commented 7 years ago

https://github.com/nodejs/security-wg/pull/26 is an example of what we could do with the data and what it could look like if comitted directlly into git, though hosting it as a sub-directory of this git repository is probably not what we want to do! ;-)

sam-github commented 6 years ago

We got a data dump, now just working on process to maintain it.

nodejs / security-wg

nsp data re-hosting by node foundation #16