Open Daniel15 opened 7 years ago
A few arguments from an internal discussion:
Saying that, we can't deny the speed improvement when unzipping non compressed tars, so there may be a reason to consider this feature
+1
shrinkpack
has become a huge part of our development workflow. When packages are upgraded and the build is "shrinkpacked", individual tar files are created for only the new packages, and the outdated versions are automatically dropped. That's because the name of the resulting .tar files are a function of the package versions. Here's a short snapshot of what an node_shrinkwrap
directory would look like:
You can explicitly follow the git history on this directory to figure out which dependencies were upgraded and when, i.e. react-native-animatable
in this example...
...with quick and easy access to the backup:
With shrinkpack, the diffs in GitHub are hyper reflective of the commit message and the actual changes being made. Commiting and pushing the result of a new shrinkpack is a better experience, IMO, than doing the same after a yarn pack, because as mentioned, changes are handled at the package version level, rather than repository version level. So you're only pushing up individual .tar
files, which is fast, especially if you're using Git LFS, and you don't need to touch your package.json version number at all.
@joncursi, we have offline mirror feature that does what you want https://yarnpkg.com/blog/2016/11/24/offline-mirror. The only thing missing is cleanup that we don't do on purpose because the storage of tars is used by multiple projects
@bestander very cool, thank you for sharing that blog post. I didn't catch this feature ability by reading the CLI docs. This would be a lovely addition to https://yarnpkg.com/en/docs/cli/config
I use shrinkpack local to each project, rather than globally for multiple projects. I would like to do the same with yarn, which would require old tar files to be removed when packages are upgraded. I only care about maintaining the latest working version of the package; if I need to dig up an older package version, it's always there in the git history. But I don't need or want to store it directly in the mirror forever.
My use-case is to implement the mirror less-so for offline purposes, and more-so for maintaining a concise list of package backups incase packages are suddenly unpublished from NPM. Risk control. As far as I know, that was largely the intent behind shrinkpack in the first place.
Is there a smarter way to automate package removal from the mirror when a new package version is added? Perhaps a config option in .yarnrc
to specify this (feature request)? ATM it seems I have to manually do...
yarn add package@new-version && rm -rf yarn_mirror/package@old-version
Also, the same issue presents itself when removing a package from use in the repo entirely...
yarn remove package && rm -rf yarn_mirror/package-*
@joncursi, this is a bit offtopic of this issue, better come up with an RFC discussion of what is needed.
As for the cleanup, it can be a 10 line JS/bash script you can run on the side of yarn until we implement it. The script should be:
This issue is specifically for switching from compressed (.tar.gz
) to uncompressed (.tar
) tarballs, anything else should be discussed in a separate task 😄
From an implementation standpoint, what sort of risks and level of effort would you foresee simply by making this a flag that you can pass to the CLI? Shrinkpack is written so that uncompressed tarballs are the default, but you can opt into compressed packages with a flag. What would the impact be for simply implementing the inverse behavior (opt-in to uncompressed with a flag)?
It seems like this would address the issue of potentially unpleasant changes for those already using the offline mirror to commit modules locally, while allowing the uncompressed behavior for those who don't mind aliasing a couple of yarn commands.
Edit: Even more simply, the flag could just be defined in the .yarnrc
This is actually the main thing preventing us from switching to yarn, as it already admirably solves the determinism issue and the offline mirror feature (thanks for the link, btw!) takes care of the rest. However, it leaves us with the undesirable (from our perspective) situation of committing binary packages. In our experience, Git does very well with simple tar, as most updated packages are recognized as renamed with tiny deltas, and the compression does all the rest. Thus, the actual bandwidth used is dramatically lower.
Yarn puts the same tarballs that it downloads from the registry into offline mirror folder.
To allow non compressed tarballs you would need to unzip it first and then zip it again.
Also the tarballs have versions in file names, so git won't be able to track version updates as small diffs.
On Wed, 7 Jun 2017 at 03:40, Brian Frichette notifications@github.com wrote:
From an implementation standpoint, what sort of risks and level of effort would you foresee simply by making this a flag that you can pass to the CLI? Shrinkpack is written so that uncompressed tarballs are the default, but you can opt into compressed packages with a flag. What would the impact be for simply implementing the inverse behavior (opt-in to uncompressed with a flag)?
It seems like this would address the issue of potentially unpleasant changes for those already using the offline mirror to commit modules locally, while allowing the uncompressed behavior for those who don't mind aliasing a couple of yarn commands.
This is actually the main thing preventing us from switching to yarn, as it already admirably solves the determinism issue and the offline mirror feature (thanks for the link, btw!) takes care of the rest. However, it leaves us with the undesirable (from our perspective) situation of committing binary packages. In our experience, Git does very well with simple tar, as most updated packages are recognized as renamed with tiny deltas, and the compression does all the rest. Thus, the actual bandwidth used is dramatically lower.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yarnpkg/yarn/issues/541#issuecomment-306669841, or mute the thread https://github.com/notifications/unsubscribe-auth/ACBdWINjiFuNdwuHxPYhFcCAng-KdXK7ks5sBg2HgaJpZM4KQspY .
You wouldn't need to unzip then zip again, you'd simply need to decompress the tarball. The inner .tar can stay the same, it'll just not be compressed.
Not sure about Git, but Mercurial tracks copied files, so it could track new versions of dependencies as copies of old ones if they're similar enough.
Sent from my phone.
On Jun 7, 2017 6:27 PM, "Konstantin Raev" notifications@github.com wrote:
Yarn puts the same tarballs that it downloads from the registry into offline mirror folder.
To allow non compressed tarballs you would need to unzip it first and then zip it again.
Also the tarballs have versions in file names, so git won't be able to track version updates as small diffs.
On Wed, 7 Jun 2017 at 03:40, Brian Frichette notifications@github.com wrote:
From an implementation standpoint, what sort of risks and level of effort would you foresee simply by making this a flag that you can pass to the CLI? Shrinkpack is written so that uncompressed tarballs are the default, but you can opt into compressed packages with a flag. What would the impact be for simply implementing the inverse behavior (opt-in to uncompressed with a flag)?
It seems like this would address the issue of potentially unpleasant changes for those already using the offline mirror to commit modules locally, while allowing the uncompressed behavior for those who don't mind aliasing a couple of yarn commands.
This is actually the main thing preventing us from switching to yarn, as it already admirably solves the determinism issue and the offline mirror feature (thanks for the link, btw!) takes care of the rest. However, it leaves us with the undesirable (from our perspective) situation of committing binary packages. In our experience, Git does very well with simple tar, as most updated packages are recognized as renamed with tiny deltas, and the compression does all the rest. Thus, the actual bandwidth used is dramatically lower.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/yarnpkg/yarn/issues/541#issuecomment-306669841, or mute the thread https://github.com/notifications/unsubscribe-auth/ ACBdWINjiFuNdwuHxPYhFcCAng-KdXK7ks5sBg2HgaJpZM4KQspY
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yarnpkg/yarn/issues/541#issuecomment-306726562, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFnHVDswWH9cTWxFh7IrnUB3_s7Q_t2ks5sBl70gaJpZM4KQspY .
Thanks, Daniel, good to know.
Although someone needs to show that this advanced mercurial/git tracking would happen on a real example then before we consider this change, right?
On Wed, 7 Jun 2017 at 09:35, Daniel Lo Nigro notifications@github.com wrote:
You wouldn't need to unzip then zip again, you'd simply need to decompress the tarball. The inner .tar can stay the same, it'll just not be compressed.
Not sure about Git, but Mercurial tracks copied files, so it could track new versions of dependencies as copies of old ones if they're similar enough.
Sent from my phone.
On Jun 7, 2017 6:27 PM, "Konstantin Raev" notifications@github.com wrote:
Yarn puts the same tarballs that it downloads from the registry into offline mirror folder.
To allow non compressed tarballs you would need to unzip it first and then zip it again.
Also the tarballs have versions in file names, so git won't be able to track version updates as small diffs.
On Wed, 7 Jun 2017 at 03:40, Brian Frichette notifications@github.com wrote:
From an implementation standpoint, what sort of risks and level of effort would you foresee simply by making this a flag that you can pass to the CLI? Shrinkpack is written so that uncompressed tarballs are the default, but you can opt into compressed packages with a flag. What would the impact be for simply implementing the inverse behavior (opt-in to uncompressed with a flag)?
It seems like this would address the issue of potentially unpleasant changes for those already using the offline mirror to commit modules locally, while allowing the uncompressed behavior for those who don't mind aliasing a couple of yarn commands.
This is actually the main thing preventing us from switching to yarn, as it already admirably solves the determinism issue and the offline mirror feature (thanks for the link, btw!) takes care of the rest. However, it leaves us with the undesirable (from our perspective) situation of committing binary packages. In our experience, Git does very well with simple tar, as most updated packages are recognized as renamed with tiny deltas, and the compression does all the rest. Thus, the actual bandwidth used is dramatically lower.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/yarnpkg/yarn/issues/541#issuecomment-306669841, or mute the thread https://github.com/notifications/unsubscribe-auth/ ACBdWINjiFuNdwuHxPYhFcCAng-KdXK7ks5sBg2HgaJpZM4KQspY
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yarnpkg/yarn/issues/541#issuecomment-306726562, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAFnHVDswWH9cTWxFh7IrnUB3_s7Q_t2ks5sBl70gaJpZM4KQspY
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yarnpkg/yarn/issues/541#issuecomment-306728462, or mute the thread https://github.com/notifications/unsubscribe-auth/ACBdWNPXiyEZpnNFpKLpD7FoIXNe6NqYks5sBmDQgaJpZM4KQspY .
Hi @bestander, we use git with bitbucket & npm + shrinkwrap on some projects. Here is what it looks like when minor version of the tar changes:
Here are sample tar files for package from screenshot that was tracked as renamed: tars.zip
Thanks
Although someone needs to show that this advanced mercurial/git tracking would happen on a real example then before we consider this change, right?
I've been meaning to test it out, I just haven't had time to do so.
Hey there! It's been awhile, and since you're busy, I thought I'd make this as painless as possible.
Check out this shrinkpack tar proof of concept
This seems like a reasonable idea after all.
So how would it work?
git/hg mv
Results: A. Potential CPU wins because step 2 will be skipped when installing from offline mirror. B. Space wins if tarball contents are similar at step 4 C. Checking in unzipped tarballs gives a negative impact on repo size D. Step 4 seems a bit complex with all sorts of edge cases
So if A + B > C + D then why not? A, B and C can be measured, although D subjective.
Bumpity bump! I can work on this if you guys want?
@bfricka, of course, give it a try. We would need to see a few real life examples with the impact this feature provides though.
Something to consider as a future enhancement, post-launch
Some people may want to store tarballs of all their dependencies in their source control repository, for example if they want a fully repeatable/reproducable build that does not depend on npm's servers. Storing compressed tarballs in Git or Mercurial is generally bad news. Every update to a package would result in a new copy of the entire file in the repo, which can make the repo very large. Every time you clone the repo, the full history is transferred including every previous version of all the packages, so even deleting the binary files has a lasting effect until you rewrite history to kill them.
Instead, we should try storing uncompressed tarballs (ie.
.tar
files). Since the tar files are mostly plain text, in theory Git/Mercurial should be able to more easily diff changes to the files if a new version of a module is added while an old version is removed and just store the delta rather than storing an entirely new blob.Related: This was implemented in Shrinkpack: https://github.com/JamieMason/shrinkpack/issues/40 and https://github.com/JamieMason/shrinkpack/commit/7b2f341408be4f0415714ec57534debfdaaa3fbf#comments. According to the comments on the commit, this actually sped up
npm install
whenshrinkpack
implemented it, as npm no longer needed to decompress the archive every time. This makes sense since you're removing the overhead ofgzip
from the installation time.