unavailable packages are listed as 0 vulnerabilities

JosiahParry commented 4 years ago

When running the audit with oysteR::audit_deps() the message will indicate how many packages were found in the Sonatype database as below.

audit = oysteR::audit_deps()
#> ℹ Calling installed.packages(), this may take time
#> 
#> ── Calling sonatype API: https://www.sonatype.com/ ──
#> 
#> → Using Sonatype tokens
#> ℹ Calling API: batch 1 of 3
#> ℹ Calling API: batch 2 of 3
#> ℹ Calling API: batch 3 of 3
#> 
#> ── Vulnerability overview ──
#> 
#> ℹ 361 packages were scanned
#> ℹ 315 packages were in the Sonatype database
#> ℹ 0 packages contains known vulnerabilities
#> ℹ A total of 0 known vulnerabilities were identified

Note that there is a discrepancy of 46 packages. These missing packages are only indicated by the missing description field. Moreover they still have fields populated and, most concerningly (I made that word up, indicate that there are 0 vulnerabilities when in reality they could not be tested. The fact that a package is not available to be tested should be reported.

I'm unsure of the best solution here, but perhaps a logical field indicating if the package is available would be best?

csgillespie commented 4 years ago

Good point.

What about something like:

361 packages were scanned
315 packages were in the Sonatype database
No known vulnerabilities were found
Note: 46 packages were not in the database, so could not be scanned

JosiahParry commented 4 years ago

That should suffice! I do think it would be nice to have a way to check for available packages that can be scanned. The missing description made me think there was a bug rather than it was unavailable.

For example googlesheets4 returned an NA for description. I checked the description with packageDescription("googlesheets4") and voila it was there.

I think my takeaway is that it's just generally not clear which of the packages are not in the database.

DarthHater commented 4 years ago

I'm not entirely sure this is possible at current time, the OSS Index API doesn't really have an indicator of if something is valid or invalid in it's response. @ken-duck @brittanybelle do you know?

brittanybelle commented 4 years ago

I'm not entirely sure this is possible at current time, the OSS Index API doesn't really have an indicator of if something is valid or invalid in it's response. @ken-duck @brittanybelle do you know?

@DarthHater is correct, at the moment the OSS Index API response does not distinguish whether a package is recognized by our database or not. Usually the absence of a package description is a pretty good indicator that we have not ingested the package into our DB as yet. However, it's probably not clear that our package index and our vulnerability records are stored separately. In other words, it's totally possible that we might not have the package in our index (e.g. no description field is available), BUT we still might have a vulnerability to report for the given purl. So the fact that don't report "package unknown" or something like that is because we might still have vulnerabilities on file for that package, even if we haven't ingested the package description. Hope that makes sense :)

To summarize - even if a package is not in our package index, we still always check our vulnerability records for every purl, and report on all vulnerabilities that we know about. So in that sense, if there are no vulnerabilities returned for a given purl, then you can be sure that means there are 0 vulns on file for that purl. (It does NOT mean the package wasn't scanned!)

(It's always possible we might be missing vulnerabilities, too -- in case you know of any R package vulnerabilities that are not reflected in OSS Index scans, then you are welcome to report any omissions or corrections. I should mention that we're going through a bit of an internal transition period, so at the moment the advisories are only looked at by one person, but soon we'll be able to process these submissions a lot faster!)

I'm not sure when the X packages were in the Sonatype database line was added to the scan output, but it's misleading to imply that some packages have not been scanned. As I explained above, OSS Index will always scan every purl that is sent to it, and we will always report on known vulns for the purls we receive. So perhaps we should adjust the scan summary output to make that more clear :)

csgillespie commented 4 years ago

@brittanybelle I suspect I was the guilty culprit for adding in the line X packages were in the Sonatype database. I came to this conclusion as there were a number of packages I knew weren't in your database (internal to JR).

I suppose it comes down to how we handle the R package: made_up_pkg. Is it better to imply that it may have been scanned or to be more cautious and say it hasn't.

but it's misleading to imply that some packages have not been scanned.

True, but is it more misleading to imply they have ;) I suspect just a change of words is necessary.

JosiahParry commented 4 years ago

To @brittanybelle's point, django is a perfect example of having vulnerabilities but no description django <- oysteR:::call_oss_index(list("pkg:pypi/django@3.0.1"), TRUE).

There is only one reported CVE in the history of R that I am aware of (https://www.cvedetails.com/cve/CVE-2016-8714/, https://talosintelligence.com/vulnerability_reports/TALOS-2016-0227). This is not reflected in the searches for grDevices. I'll report. But this is more of an issue with the R community not looking for them. Meaning that any search for all of CRAN for all version of R will likely result in 0 vulnerabilities found even if they exist and are latent.

Point being, I think it is important to mention if a package has not been scanned. I think it make more sense to err on the side of caution.

sonatype-nexus-community / oysteR

unavailable packages are listed as 0 vulnerabilities #20