yarnpkg / berry

📦🐈 Active development trunk for Yarn ⚒
https://yarnpkg.com
BSD 2-Clause "Simplified" License
7.43k stars 1.11k forks source link

Yarn berry does not properly handle tarballs with `./` as first component, unlike Yarn 1.x #4070

Open devversion opened 2 years ago

devversion commented 2 years ago

Self-service

Describe the bug

Yarn Berry/v3 does not properly deal with tarballs generated without yarn pack, that previously worked as expected with Yarn 1.x classic.

Assume we have a tar archive with entries like this:

./package.json

Yarn 1.x always strips the first component away. Similarly Yarn berry attempts this but fails due to some normalization logic applied to deal with its portable paths as it seems. The result is that before stripping the first component, Yarn berry accidentally normalizes the path to just package.json and therefore the entry is being omitted as it will be considered a "first component".

To reproduce

const fs = require('fs');
const path = require('path');
const childProcess = require('child_process');

// create a simple tar
const tarBallPath = await createSimplePackageTar();

const installPromise = packageJsonAndInstall({
  dependencies: {
    [`my-test-pkg`]: `file:${tarBallPath}`,
  }
});

await expect(installPromise).resolves.toBeTruthy();

//
// Helpers
//

async function createSimplePackageTar() {
    const tmpDir = path.resolve('/tmp/my-simple-package-tar-folder');
    const pkgJsonPath = path.join(tmpDir, 'package.json');
    const tarOutPath = path.resolve('./my-tar-pkg.tgz');

    await fs.promises.mkdir(tmpDir, {recursive: true});
    await fs.promises.writeFile(pkgJsonPath, JSON.stringify({
        name: 'test',
        version: '0.0.0',
    }, null, 2));

    // Creates a TAR with an entry like:
    // -> `./package.json`. 
    // Yarn pack would create:
    // -> `package/package.json`
    // interestingly though, Yarn 1.x did handle both tarballs
    // properly because it just stripped of the first components. Yarn
    // Berry attempts the same but tries to be a little smarter with its
    // virtual file system and accidentally normalizes the path entry first
    // causing the `./package.json` entry to end up being stripped to just
    // an empty string, causing it to be omitted!
    await childProcess.spawnSync('tar', ['-cf', tarOutPath, './package.json'], {
      cwd: tmpDir
    });
    // await yarn('pack', '--out', tarOutPath, {cwd: tmpDir});

    return tarOutPath;
}

Environment

System:
    OS: macOS 12.0.1
    CPU: (8) arm64 Apple M1
  Binaries:
    Node: 16.11.0 - /private/var/folders/z_/dlggq0hn4yvc3rdr7qt5vf9h0000gn/T/xfs-ff133f5a/node
    Yarn: 3.2.0-rc.13.git.20220204.hash-3443cfa - /private/var/folders/z_/dlggq0hn4yvc3rdr7qt5vf9h0000gn/T/xfs-ff133f5a/yarn
    npm: 8.0.0 - ~/bin/node-v16-11/bin/npm

Additional context

No response

yarnbot commented 2 years ago

This issue reproduces on master:

Error: expect(received).resolves.toBeTruthy()

Received promise rejected instead of resolved
Rejected to value: [Error: Command failed: /usr/bin/node /github/workspace/scripts/actions/../run-yarn.js install

➤ YN0000: ┌ Resolution step
::group::Resolution step
➤ YN0001: │ Error: my-test-pkg@file:/tmp/tmp-18a6mLA1rPZdRa/my-tar-pkg.tgz::locator=root-workspace-0b6124%40workspace%3A.: Manifest not found
    at Function.find (/github/workspace/packages/yarnpkg-core/sources/Manifest.ts:114:13)
    at /github/workspace/packages/plugin-file/sources/TarballFileResolver.ts:70:14
    at Object.releaseAfterUseAsync (/github/workspace/packages/yarnpkg-core/sources/miscUtils.ts:161:12)
    at TarballFileResolver.resolve (/github/workspace/packages/plugin-file/sources/TarballFileResolver.ts:69:22)
    at MultiResolver.resolve (/github/workspace/packages/yarnpkg-core/sources/MultiResolver.ts:57:12)
    at MultiResolver.resolve (/github/workspace/packages/yarnpkg-core/sources/MultiResolver.ts:57:12)
    at /github/workspace/packages/yarnpkg-core/sources/Project.ts:722:18
    at Object.prettifyAsyncErrors (/github/workspace/packages/yarnpkg-core/sources/miscUtils.ts:172:12)
    at startPackageResolution (/github/workspace/packages/yarnpkg-core/sources/Project.ts:721:29)
::endgroup::
➤ YN0000: └ Completed in 0s 528ms
➤ YN0000: Failed with errors in 0s 541ms
]
    at expect (/github/workspace/.yarn/cache/expect-npm-24.8.0-8c7640c562-0c0da74930.zip/node_modules/expect/build/index.js:138:15)
    at module.exports (evalmachine.<anonymous>:15:7)
    at async /github/workspace/.yarn/cache/@arcanis-sherlock-npm-2.0.3-558f52b79f-ccfa417b92.zip/node_modules/@arcanis/sherlock/lib/executeRepro.js:57:13
    at async executeInTempDirectory (/github/workspace/.yarn/cache/@arcanis-sherlock-npm-2.0.3-558f52b79f-ccfa417b92.zip/node_modules/@arcanis/sherlock/lib/executeRepro.js:18:16)
    at async executeRepro (/github/workspace/.yarn/cache/@arcanis-sherlock-npm-2.0.3-558f52b79f-ccfa417b92.zip/node_modules/@arcanis/sherlock/lib/executeRepro.js:25:12)
    at async ExecCommand.execute (/github/workspace/.yarn/cache/@arcanis-sherlock-npm-2.0.3-558f52b79f-ccfa417b92.zip/node_modules/@arcanis/sherlock/lib/commands/exec.js:26:38)
    at async ExecCommand.validateAndExecute (/github/workspace/.yarn/cache/clipanion-npm-2.0.0-rc.16-b9444aaf89-91cf93ba72.zip/node_modules/clipanion/lib/advanced/Command.js:161:26)
    at async Cli.run (/github/workspace/.yarn/cache/clipanion-npm-2.0.0-rc.16-b9444aaf89-91cf93ba72.zip/node_modules/clipanion/lib/advanced/Cli.js:74:24)
    at async Cli.runExit (/github/workspace/.yarn/cache/clipanion-npm-2.0.0-rc.16-b9444aaf89-91cf93ba72.zip/node_modules/clipanion/lib/advanced/Cli.js:83:28)
arcanis commented 2 years ago

Out of curiosity, what tool generates tarballs like this? It sounds a little invalid

devversion commented 2 years ago

@arcanis It's really just the standard GNU tar command (see example above) that will generate such archive members, depending on whether you pass it absolute filenames, or relative like this (which can happen if you use find together or something).

Generally, these are valid entry/archive member names for tar files, it's just that for tar component stripping, normalization which would collapse ./blub to blub is unexpected. Libraries like tar-fs does not do this by default either, so this seems like the ecosystem standard with respect to tarballs. Yarn 1.x did not break for such tarballs in that case, especially not with a rather confusing error message as in this issue (see above; it's not clear why this happens and took me quite some time to debug).

Really my case as a concrete example: We are using Bazel as build tool for Angular, and tarballs generated with Bazel through rules_pkg currently come with ./ as the first component, breaking our integration testing to prepare for Yarn berry/PnP. It's clear that we could somehow tweak the package setup to use something else as first component (like package/), but the mismatch for tarballs in Yarn 1.x and Yarn 2.x came as a surprise (with a hard-to-debug message)

arcanis commented 2 years ago

My problem with that is that paths inside tar archives are paths, so normalizing them shouldn't make a difference 🤔

I understand that it worked in Yarn Classic (and perhaps other package managers), but it sounds like a bug, and I'm not sure I want to follow this behaviour. I myself had to implement a few security-related fixes in Yarn Classic due to how tarball paths could lead to escapes during installs, and the more broken assumptions we have the more likely it is that we leave open an attack surface that can be exploited somehow.

Really my case as a concrete example: We are using Bazel as build tool for Angular, and tarballs generated with Bazel through rules_pkg currently come with ./ as the first component, breaking our integration testing to prepare for Yarn berry/PnP. It's clear that we could somehow tweak the package setup to use something else as first component (like package/), but the mismatch for tarballs in Yarn 1.x and Yarn 2.x came as a surprise (with a hard-to-debug message)

I'd prefer if rules_pkg was following a more "standard" tarball pattern. If you call tar manually to generate your tarballs, then it'd be probably for the best to follow the same layout as yarn pack would generate (although I'm curious - why not call yarn pack directly?).

devversion commented 2 years ago

My problem with that is that paths inside tar archives are paths, so normalizing them shouldn't make a difference 🤔

Hmm. Why aren't paths not normalized as part of the creation then? It surely makes a difference for the stripping mechanism. The normalization does not occur in other tar implementations (in creation neither in extraction). That seems like the convention across implementations (including the GNU tar implementation itself). I haven't found anything in the GNU tar spec that elaborates much on the ./<> case, neither it being necessarily talking about paths, but rather about arbitrary archive member names.

I do acknowledge the security aspect you mention, but that one can be easily solved by normalizing paths just for the jail mechanism, as a side note.

I'm not sure it's a bug for tars because you could pretty easily end up having such archive members, given my basic GNU tar example for Sherlock. The error from Yarn is correct in terms of not finding a manifest but that is still super confusing if you pass a valid tar archive that works in Yarn 1.x, NPM etc. - I personally don't see any strong reason to deviate there, especially with the --strip-components mechanism (working with ./) being well established in the ecosystem as it seems?

I'd prefer if rules_pkg was following a more "standard" tarball pattern.

Is it really a standard pattern to not have . as a leading component? I think that will be hard to answer (as said it's fairly straightforward to end up with ./<XX> via command line), but generally I agree that the "member convention" from yarn pack would be obviously better for Yarn. Invoking Yarn from Bazel has the consequence of increasing the build graph a little more than invoking just the system tar/ or just rules_pkg (although that is not a good argument; conceptually the deviation in Yarn berry seems more surprising; that's why I submitted the issue)

devversion commented 2 years ago

Thanks for taking the time to look into this issue though, regardless of the outcome!

arcanis commented 2 years ago

Lost a prior answer 😅

Invoking Yarn from Bazel has the consequence of increasing the build graph a little more than invoking just the system tar/ or just rules_pkg (although that is not a good argument; conceptually the deviation in Yarn berry seems more surprising; that's why I submitted the issue)

Yes, but yarn pack (and to a lesser extend similar commands from other package managers, like npm pack) does more than just call tar. It also processes the .gitignore / files field, it replaces the main / module / exports fields according to the definition in publishConfig, it validates the consistency of the project†, etc. And of course it guarantees that the output can be installed when referenced via the file: protocol, making it the most portable thing to use as there's otherwise no formal spec on what a packed package should look like.

† It doesn't do that yet, but I'd like to implement it in the next major

The error from Yarn is correct in terms of not finding a manifest but that is still super confusing if you pass a valid tar archive that works in Yarn 1.x, NPM etc. - I personally don't see any strong reason to deviate there, especially with the --strip-components mechanism (working with ./) being well established in the ecosystem as it seems?

It's mostly just a matter of risk assessment. I've seen firsthand the kind of bugs and security risks that this type of "accidentally working" unplanned behaviours can cause in older package managers, and for modern Yarn releases I tend to reevaluate them on their own to see if they're truly necessary. Sometimes they are, sometimes they can be fixed/changed easily enough that dropping them is perhaps a good idea to keep Yarn stable. I have a feeling it's the case here 🤔

devversion commented 2 years ago

Yes, but yarn pack (and to a lesser extend similar commands from other package managers, like npm pack) does more than just call tar. It also processes the .gitignore / files field, it replaces the main / module / exports fields according to the definition in publishConfig, it validates the consistency of the project†, etc. And of course it guarantees that the output can be installed when referenced via the file: protocol, making it the most portable thing to use as there's otherwise no formal spec on what a packed package should look like.

That makes sense. I'd argue that all of these additional things are not strictly necessary depending on the scenario, but generally I'd agree that it definitely makes sense to prefer yarn pack. It's also fair to say that only the output from yarn pack is compatible with Yarn's implementation of the file:*.tgz protocol.

I don't believe there would be any security impact if the security checks would remain conceptually the same, but just the member extraction would match with npm:tar, npm:tar-fs, gnu:tar etc. Definitely do-able, but as said, it's also fair to just say that yarn pack is the accepted here. I'm good with that! It's just a little unfortunate to see this subtle difference in Yarn 1.x/v2 and with other package managers (we will have integration testing for)

Generally I think the best, given your points, would be to document the requirements of file: a little more, and also potentially have a warning for such archive members (because the error message itself is insufficient IMO). What do you think?