oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.57k stars 308 forks source link

Analyzer skipping Yarn2+ dependencies #8058

Open thirteenthstep opened 9 months ago

thirteenthstep commented 9 months ago

I'm currently working on a project in which the analyzer result will miss more than half of our dependencies. The problem can be easily reproduced with an almost empty project and a single dependency:

{
  "name": "ort-test",
  "packageManager": "yarn@4.0.2",
  "dependencies": {
    "@react-native-community/netinfo": "11.2.1"
  }
} 

Executing ort --info analyze -f JSON -o ./analyze-ort-test -i ./ort-test will result in 0 packages found.

This Problem occures when:

  1. Yarn Berry (2+) is used
  2. The dependency in question itself has peer dependencies

The dependency used by this example can be swapped with any other package with peer dependencies, e.g. react-native (depending on react).

When investigating the issue, it is likely caused by Yarn using a virtual file layer for packages with peer dependencies. If we have a look at the result of yarn info -A -R --manifest --json (the command used by ort to list deps) our dependency is listed using a "virtual" locator:

{
  "value": "@react-native-community/netinfo@npm:11.2.1",
  "children": {
    "Instances": 1,
    "Version": "11.2.1",
    "Manifest": {
      "License": "MIT",
      "Homepage": "https://github.com/react-native-netinfo/react-native-netinfo#readme"
    }
  }
}
{
  "value": "ort-test@workspace:.",
  "children": {
    "Version": "0.0.0",
    "Manifest": {
      "License": null,
      "Homepage": null
    },
    "Dependencies": [
      {
        "descriptor": "@react-native-community/netinfo@virtual:7ef2ec639f3470d2191275e2f3fd8391c01d875228d71bf82ba9ce9a90858464fb499034540949cf9bef8723b3f2638ad6037a3f99ed4516ef466bf8a24771a3#npm:11.2.1",
        "locator": "@react-native-community/netinfo@virtual:7ef2ec639f3470d2191275e2f3fd8391c01d875228d71bf82ba9ce9a90858464fb499034540949cf9bef8723b3f2638ad6037a3f99ed4516ef466bf8a24771a3#npm:11.2.1"
      }
    ]
  }
}

Unfortunately all dependencies with "virtual" locators will be skipped by ort during dependency aggregation at Yarn2.kt

when {
    // Prevent @virtual dependencies because they are internal to Yarn.
    // See https://yarnpkg.com/features/protocols
    "virtual" in locatorType -> null 

I do not fully understand the code comment regarding the "internal" part, but if we prevent the skipping, e.g. by replacing the "virtual" part, the result seems to be correct:

Index: plugins/package-managers/node/src/main/kotlin/Yarn2.kt
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/plugins/package-managers/node/src/main/kotlin/Yarn2.kt b/plugins/package-managers/node/src/main/kotlin/Yarn2.kt
--- a/plugins/package-managers/node/src/main/kotlin/Yarn2.kt    (revision 979847bbb5a4558a7f8cbe2a7c5256600da913cb)
+++ b/plugins/package-managers/node/src/main/kotlin/Yarn2.kt    (date 1702601223304)
@@ -591,7 +591,7 @@
      */
     private fun processDependencies(dependencies: JsonNode): List<Identifier> =
         dependencies.mapNotNull { dependency ->
-            val locator = dependency["locator"].textValue()
+            val locator = dependency["locator"].textValue().replace(Regex("(.+)@virtual:(.+)#(.+)"),"$1@$3")
             val locatorMatcher = EXTRACT_FROM_LOCATOR_PATTERN.matchEntire(locator)
             if (locatorMatcher == null) {
                 issues += createAndLogIssue(

As I do not know the intention behind the skipping I do not feel capable of suggesting a PR in this case, but I would be glad about any information regarding the logic behind it and potential side effects if we continue using this workaround?

sschuberth commented 9 months ago

Thanks for the report @thirteenthstep. The "virtual" in locatorType -> null line was introduced in 3c02a8b38c2195c39fd02df576f63cb8f46dd8ed by @nnobelis, maybe he recalls?

In any case, it seems that you just provided the proof that "virtual" locators are not "internal to Yarn" only.

sschuberth commented 9 months ago

Also, I wonder whether simply dropping the when clause and just keeping the else code would solve your issue, @thirteenthstep?

thirteenthstep commented 9 months ago

Also, I wonder whether simply dropping the when clause and just keeping the else code would solve your issue, @thirteenthstep?

That should be the correct approach. Adjustments regarding the locator pattern matching would then be necessary to work with "virtual" locator strings.

I just tried changing "(.+)@(\\w+):(.+)" to "(.+)@(\\w+).*:(.+)". This should capture the correct values, but for some reason this solution still results in round about 10% less dependencies then the previous workaround.

sschuberth commented 9 months ago

Also, I wonder whether simply dropping the when clause and just keeping the else code would solve your issue, @thirteenthstep?

That should be the correct approach.

Would you be willing to contribute a fix based on that, including any other needed changes plus ideally a test case, since you're in the best situation currently to reproduce the issue?

thirteenthstep commented 8 months ago

@sschuberth I'm not sure if it's actually possible to handle all variations. A closer look at those missing dependencies shows that there are many possible patterns for those "locator" entries, like typescript@patch:typescript@npm%3A5.0.2#optional!builtin<compat/typescript>::version=5.0.2&hash=b5f058 or @some-scope/some-package@virtual:29710f211c97df3f40f9aeae5805aee7882fc1494a5d32550cc951f29c81c0f54881d323519532f0d351bd92c1d19c52ff1ecddba1ffd0dfb608cc5779d3b644#npm:4.17.4::__archiveUrl=https%3A%2F%2Fsome.repo.url-4.17.4.tgz. Modifying the existing regex-approach appears to be quite challenging and I have no idea if there are even more possible variations/patterns. I wonder if it would be possible to take these information from yarn.lockinstead?

nnobelis commented 8 months ago

@sschuberth @thirteenthstep

Please don't take the skipping of "virtual" I put as golden: I have low Yarn expertise and at the time, it seems skipping made the most sense. I am fine with changing the logic.

thirteenthstep commented 8 months ago

Update: I could not come up with a solution to make it work for every scenario. Eventually we had to switch to https://github.com/CycloneDX/cdxgen to analyze yarn4 dependencies.