oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.61k stars 312 forks source link

Analyzer skipping Yarn2+ dependencies #8058

Open thirteenthstep opened 11 months ago

thirteenthstep commented 11 months ago

I'm currently working on a project in which the analyzer result will miss more than half of our dependencies. The problem can be easily reproduced with an almost empty project and a single dependency:

{
  "name": "ort-test",
  "packageManager": "yarn@4.0.2",
  "dependencies": {
    "@react-native-community/netinfo": "11.2.1"
  }
} 

Executing ort --info analyze -f JSON -o ./analyze-ort-test -i ./ort-test will result in 0 packages found.

This Problem occures when:

  1. Yarn Berry (2+) is used
  2. The dependency in question itself has peer dependencies

The dependency used by this example can be swapped with any other package with peer dependencies, e.g. react-native (depending on react).

When investigating the issue, it is likely caused by Yarn using a virtual file layer for packages with peer dependencies. If we have a look at the result of yarn info -A -R --manifest --json (the command used by ort to list deps) our dependency is listed using a "virtual" locator:

{
  "value": "@react-native-community/netinfo@npm:11.2.1",
  "children": {
    "Instances": 1,
    "Version": "11.2.1",
    "Manifest": {
      "License": "MIT",
      "Homepage": "https://github.com/react-native-netinfo/react-native-netinfo#readme"
    }
  }
}
{
  "value": "ort-test@workspace:.",
  "children": {
    "Version": "0.0.0",
    "Manifest": {
      "License": null,
      "Homepage": null
    },
    "Dependencies": [
      {
        "descriptor": "@react-native-community/netinfo@virtual:7ef2ec639f3470d2191275e2f3fd8391c01d875228d71bf82ba9ce9a90858464fb499034540949cf9bef8723b3f2638ad6037a3f99ed4516ef466bf8a24771a3#npm:11.2.1",
        "locator": "@react-native-community/netinfo@virtual:7ef2ec639f3470d2191275e2f3fd8391c01d875228d71bf82ba9ce9a90858464fb499034540949cf9bef8723b3f2638ad6037a3f99ed4516ef466bf8a24771a3#npm:11.2.1"
      }
    ]
  }
}

Unfortunately all dependencies with "virtual" locators will be skipped by ort during dependency aggregation at Yarn2.kt

when {
    // Prevent @virtual dependencies because they are internal to Yarn.
    // See https://yarnpkg.com/features/protocols
    "virtual" in locatorType -> null 

I do not fully understand the code comment regarding the "internal" part, but if we prevent the skipping, e.g. by replacing the "virtual" part, the result seems to be correct:

Index: plugins/package-managers/node/src/main/kotlin/Yarn2.kt
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/plugins/package-managers/node/src/main/kotlin/Yarn2.kt b/plugins/package-managers/node/src/main/kotlin/Yarn2.kt
--- a/plugins/package-managers/node/src/main/kotlin/Yarn2.kt    (revision 979847bbb5a4558a7f8cbe2a7c5256600da913cb)
+++ b/plugins/package-managers/node/src/main/kotlin/Yarn2.kt    (date 1702601223304)
@@ -591,7 +591,7 @@
      */
     private fun processDependencies(dependencies: JsonNode): List<Identifier> =
         dependencies.mapNotNull { dependency ->
-            val locator = dependency["locator"].textValue()
+            val locator = dependency["locator"].textValue().replace(Regex("(.+)@virtual:(.+)#(.+)"),"$1@$3")
             val locatorMatcher = EXTRACT_FROM_LOCATOR_PATTERN.matchEntire(locator)
             if (locatorMatcher == null) {
                 issues += createAndLogIssue(

As I do not know the intention behind the skipping I do not feel capable of suggesting a PR in this case, but I would be glad about any information regarding the logic behind it and potential side effects if we continue using this workaround?

sschuberth commented 11 months ago

Thanks for the report @thirteenthstep. The "virtual" in locatorType -> null line was introduced in 3c02a8b38c2195c39fd02df576f63cb8f46dd8ed by @nnobelis, maybe he recalls?

In any case, it seems that you just provided the proof that "virtual" locators are not "internal to Yarn" only.

sschuberth commented 11 months ago

Also, I wonder whether simply dropping the when clause and just keeping the else code would solve your issue, @thirteenthstep?

thirteenthstep commented 11 months ago

Also, I wonder whether simply dropping the when clause and just keeping the else code would solve your issue, @thirteenthstep?

That should be the correct approach. Adjustments regarding the locator pattern matching would then be necessary to work with "virtual" locator strings.

I just tried changing "(.+)@(\\w+):(.+)" to "(.+)@(\\w+).*:(.+)". This should capture the correct values, but for some reason this solution still results in round about 10% less dependencies then the previous workaround.

sschuberth commented 11 months ago

Also, I wonder whether simply dropping the when clause and just keeping the else code would solve your issue, @thirteenthstep?

That should be the correct approach.

Would you be willing to contribute a fix based on that, including any other needed changes plus ideally a test case, since you're in the best situation currently to reproduce the issue?

thirteenthstep commented 11 months ago

@sschuberth I'm not sure if it's actually possible to handle all variations. A closer look at those missing dependencies shows that there are many possible patterns for those "locator" entries, like typescript@patch:typescript@npm%3A5.0.2#optional!builtin<compat/typescript>::version=5.0.2&hash=b5f058 or @some-scope/some-package@virtual:29710f211c97df3f40f9aeae5805aee7882fc1494a5d32550cc951f29c81c0f54881d323519532f0d351bd92c1d19c52ff1ecddba1ffd0dfb608cc5779d3b644#npm:4.17.4::__archiveUrl=https%3A%2F%2Fsome.repo.url-4.17.4.tgz. Modifying the existing regex-approach appears to be quite challenging and I have no idea if there are even more possible variations/patterns. I wonder if it would be possible to take these information from yarn.lockinstead?

nnobelis commented 10 months ago

@sschuberth @thirteenthstep

Please don't take the skipping of "virtual" I put as golden: I have low Yarn expertise and at the time, it seems skipping made the most sense. I am fine with changing the logic.

thirteenthstep commented 10 months ago

Update: I could not come up with a solution to make it work for every scenario. Eventually we had to switch to https://github.com/CycloneDX/cdxgen to analyze yarn4 dependencies.

sschuberth commented 1 week ago

@fviernau do you think this is still an issue with your latest refactorings in mind?

fviernau commented 1 week ago

@fviernau do you think this is still an issue with your latest refactorings in mind?

The latest refactorings have touched only Npm, Pnpm and Yarn but not Yarn2. Yarn2 has already been a separate implementation. However, I'm also planning to clean-up the Yarn2 implementation and remove some code redundancies. Probably it also can make sense to introduce data classes for the parsing. Anyhow, once I arrive at working on the Yarn2 code, I would put this ticket on the list of things to consider.