Open thirteenthstep opened 11 months ago
Thanks for the report @thirteenthstep. The "virtual" in locatorType -> null
line was introduced in 3c02a8b38c2195c39fd02df576f63cb8f46dd8ed by @nnobelis, maybe he recalls?
In any case, it seems that you just provided the proof that "virtual" locators are not "internal to Yarn" only.
Also, I wonder whether simply dropping the when
clause and just keeping the else
code would solve your issue, @thirteenthstep?
Also, I wonder whether simply dropping the
when
clause and just keeping theelse
code would solve your issue, @thirteenthstep?
That should be the correct approach. Adjustments regarding the locator pattern matching would then be necessary to work with "virtual" locator strings.
I just tried changing "(.+)@(\\w+):(.+)"
to "(.+)@(\\w+).*:(.+)"
. This should capture the correct values, but for some reason this solution still results in round about 10% less dependencies then the previous workaround.
Also, I wonder whether simply dropping the
when
clause and just keeping theelse
code would solve your issue, @thirteenthstep?That should be the correct approach.
Would you be willing to contribute a fix based on that, including any other needed changes plus ideally a test case, since you're in the best situation currently to reproduce the issue?
@sschuberth I'm not sure if it's actually possible to handle all variations. A closer look at those missing dependencies shows that there are many possible patterns for those "locator" entries, like
typescript@patch:typescript@npm%3A5.0.2#optional!builtin<compat/typescript>::version=5.0.2&hash=b5f058
or @some-scope/some-package@virtual:29710f211c97df3f40f9aeae5805aee7882fc1494a5d32550cc951f29c81c0f54881d323519532f0d351bd92c1d19c52ff1ecddba1ffd0dfb608cc5779d3b644#npm:4.17.4::__archiveUrl=https%3A%2F%2Fsome.repo.url-4.17.4.tgz
.
Modifying the existing regex-approach appears to be quite challenging and I have no idea if there are even more possible variations/patterns.
I wonder if it would be possible to take these information from yarn.lock
instead?
@sschuberth @thirteenthstep
Please don't take the skipping of "virtual" I put as golden: I have low Yarn expertise and at the time, it seems skipping made the most sense. I am fine with changing the logic.
Update: I could not come up with a solution to make it work for every scenario. Eventually we had to switch to https://github.com/CycloneDX/cdxgen to analyze yarn4 dependencies.
@fviernau do you think this is still an issue with your latest refactorings in mind?
@fviernau do you think this is still an issue with your latest refactorings in mind?
The latest refactorings have touched only Npm
, Pnpm
and Yarn
but not Yarn2
. Yarn2
has already been a separate implementation. However, I'm also planning to clean-up the Yarn2
implementation and remove some code redundancies. Probably it also can make sense to introduce data classes for the parsing. Anyhow, once I arrive at working on the Yarn2
code, I would put this ticket on the list of things to consider.
I'm currently working on a project in which the analyzer result will miss more than half of our dependencies. The problem can be easily reproduced with an almost empty project and a single dependency:
Executing
ort --info analyze -f JSON -o ./analyze-ort-test -i ./ort-test
will result in 0 packages found.This Problem occures when:
The dependency used by this example can be swapped with any other package with peer dependencies, e.g.
react-native
(depending on react).When investigating the issue, it is likely caused by Yarn using a virtual file layer for packages with peer dependencies. If we have a look at the result of
yarn info -A -R --manifest --json
(the command used by ort to list deps) our dependency is listed using a "virtual" locator:Unfortunately all dependencies with "virtual" locators will be skipped by ort during dependency aggregation at Yarn2.kt
I do not fully understand the code comment regarding the "internal" part, but if we prevent the skipping, e.g. by replacing the "virtual" part, the result seems to be correct:
As I do not know the intention behind the skipping I do not feel capable of suggesting a PR in this case, but I would be glad about any information regarding the logic behind it and potential side effects if we continue using this workaround?