shadowspawn / nvh

Easily install Node.js versions
MIT License
11 stars 3 forks source link

XZ compression support #8

Closed DeeDeeG closed 4 years ago

DeeDeeG commented 5 years ago

Hi again.

I recently made the pull request to add an xz compression option to n. I noticed NVH doesn't have this option at the moment.

I wondered about the way to do it "properly," since it's a bit obscure the way it's implemented in n right now. As such, I decided to open an issue and not jump straight to a pull request this time.

The feature could be added similarly to how it was done over at tj/n:

But I also thought about the following:

Overall I figured it would be a low-pressure way to figure out doing xz compression the "right way" to do so in this repo. As a fork, there is a bit more freedom. And I suppose whatever solution is decided here can probably be ported to tj/n relatively easily if desired.

If you have thoughts on the preferred way to do this, I would be interested in working on and/or collaborating on an implementation.

Best regards.

DeeDeeG commented 5 years ago

I made some example implementations:

Side note: I found working with the command-line arguments section of the script pretty easy, so I'm thankful it is set up so neatly to begin with!

shadowspawn commented 5 years ago

Hi @DeeDeeG

Whether to make xz compression the default (keeping gz as a fallback)

Making xz the default is my preferred method, if it is robust enough and not too complicated!

Looking at other managers:

I do not want all the complicated version testing like nvm does, thinking just the >4 test. Older downloads can always use gzip, more recent versions can use xz if available.

DeeDeeG commented 5 years ago

Thanks for the reply.

Here is an implementation with those checks:


If both of these pass, xz use is allowed. I also added NVH_USE_XZ="true" at the top, in the setup phase, which seemed like the best way to tell NVH to use xz by default.


My thoughts regarding this implementation:

DeeDeeG commented 5 years ago

Update about macOS:

On recent macOS, the xz program (as such) is not present on the system on a fresh install. But oddly, their gunzip utility can extract xz archives. (See this StackExchange discussion). Even though gunzip on other platforms cannot. (Refer to man gunzip in a Linux machine and a macOS machine to see the differences; macOS has several other non-gzip algorithms rolled in, whereas Linux does not.)

In any case...

curl [URL] | tar -Jx will work, seemingly irrespective of using the -J flag.

(Oddly, and mildly off-topic/a bit of a sidebar: downloading the xz archive and running tar -Jx [path/to/archive.xz] doesn't work. But a generic tar -xf [path/to/archive.xz] does work.)

I don't think macOS tar properly knows what to make of the -J flag, or how to properly understand that what an xz archive is locally on disk... But can extract it if told "this is the archive I want to extract" with the -f flag, or if piped in via stdin.

I don't 100% understand why it all behaves that way, but the behavior has been mapped out. This requires a bit of extra handling, so I will think about it and look for a good solution for the "supports xz" test. I am thinking of blanket whitelisting macOS as supporting xz, but this is probably not totally accurate? Not sure how far back macOS tar can decompress xz.

Edit: Came back to add a link/source

shadowspawn commented 5 years ago

I had noticed the Mac setup when we were looking at adding the environment variable to n (https://github.com/tj/n/pull/571#issuecomment-505679227) and didn't work out the details. I have xz installed via homebrew as a dependency of something else.

You can see why I was happier initially to add xz as an env var than on by default! 🙂

I did a bit of research and found tar mentions libarchive, and that lists support for xz from 2009. I have not tied that back to tar and Mac and Linux et al, so can't draw any conclusions yet.

https://github.com/libarchive/libarchive/wiki/ReleaseNotes#libarchive-270

DeeDeeG commented 5 years ago

Short summary of this comment: It looks like we can enable xz on macOS 10.7 and above. macOS version can be checked on command-line with a utility: sw_vers -productVersion


Found some info about OS X and libarchive versions. According to this: https://opensource.apple.com/

OS X 10.6.x shipped with libarchive 2.6.2. And OS X 10.7.x shipped with libarchive 2.8.3.

(2.8.3 is still used in Mojave, albeit with various, minor-looking, darwin-specific patches accumulated over the years if I understand correctly. For example, see libarchive.plist from macOS Mojave 10.14.5 for details.)

So without being able to test a bunch of old mac computers, I'd speculate the correct cutoff is OSX 10.7 and above can use xz, 10.6 and below can't.


Implementation note: It looks like the user's macOS/OSX version can be read on the command-line in a script-friendly way: https://www.cyberciti.biz/faq/mac-osx-find-tell-operating-system-version-from-bash-prompt/

Like so: sw_vers -productVersion output sample:10.11.1


I also poked around in the libarchive source history, and found references to unxz as an xz decompression provider in some cases, and the bsdtar command (separate from GNU tar). And found out that the -J flag is documented via tar --help, despite it not being in the man pages (man tar).

On a mac running El Capitan, I found the following:

shadowspawn commented 5 years ago

Good digging, thanks.

I did some digging tonight and came across a few interesting links. I have not formed conclusions yet, but starting to wonder whether full "properly" is too hard! Interesting that node are talking about reducing number of options.

DeeDeeG commented 5 years ago

Mini update to the update about macOS:

As this StackOverflow answer rightly points out, the xz/lzma feature of bsdtar is configurable at compile time. The feature is present back to macOS 10.7, but it is configured off until macOS 10.9.

So I'd say move the cutoff version to macOS 10.9 and above. (Wish I had an array of old macs to test this on!)


I can test a bunch of old Linux releases, so once the more theoretical/research parts are done, I plan to do that (test a bunch of Linux distros against the draft PR) and confirm this works.

I don't feel that working this out for SunOS or AIX is worth it, or feasible, since as far as I can see, I can't test either. Just did a quick search, and you can actually download SmartOS, which is what Node really means by SunOS, apparently. So I might give that a shot.

Regarding AIX: There are no xz tarballs for AIX. e.g. this is the only AIX tarball for node v12.9.1: node-v12.9.1-aix-ppc64.tar.gz) So xz is off the table for AIX. But we can tell that the arch for AIX should always be "ppc64". That could be addressed in a separate PR.


node are talking about reducing number of options.

We can always do is_ok "${xz-url}" to make sure it exists. I actually have a WIP implementation of this, but it's rather messy. Waiting on more research before tidying it up. (Will post as a draft if requested to though. I don't mind posting early/often for review.)

I have not formed conclusions yet, but starting to wonder whether full "properly" is too hard!

That's what I presumed when I began, but I just figured I'd give it a good try and see how far I get. I'm happy with progress so far.

DeeDeeG commented 5 years ago

Here are the work-in-progress implementations:

DeeDeeG commented 5 years ago

Short summary: Linux looks good with these changes.

(I know that 's a broad thing to say, but I did test quite a few distros. All Linux distros that I tested use GNU tar. GNU tar uses xz from the PATH by default (unless configured to do otherwise; no distro appears to have done otherwise). The test added to nvh for this issue, which similarly looks for xz on the PATH, had 1:1 correlation with the given Linux distro's [GNU] tar successfully using xz to extract tarballs.)

(SmartOS is a different story, since it's not Linux, and has its own (unique?) tar.)

(As mentioned in previous comments, macOS uses bsdtar and acts a bit different compared to Linux, so I would like to test on old macOS.)


I have been able to test the script as updated for this issue. I tested it on quite a variety of Linux distros, including some older ones to check how far back xz support goes.

Here are my results: (click to expand) | Operating System | nvh status | nvh status http | tar version on iso | latest tar in repos | xz installed on iso | xz available in repos | curl installed on iso | wget installed on iso | |---------------------------------|----------------|-----------------|----------------|----------------|---------------------|-----------------------|-----------------------|-----------------------| | Debian 5.0.10 Lenny | https error ** | no rsync | GNU tar 1.20 | GNU tar 1.20 | no | no | no | yes | | Debian 6.0.10 Squeeze | https error ** | good (gzip/xz) | GNU tar 1.23 | GNU tar 1.23 | yes | yes ("xz-utils") | no | yes | | Ubuntu 9.10 Karmic Koala | https error ** | good (gzip) | GNU tar 1.22 | GNU tar 1.22 | no | yes ("xz-utils") * | no | yes | | Ubuntu 10.04 Lucid Lynx | https error ** | good (gzip/xz) | GNU tar 1.22 | GNU tar 1.22 | no | yes ("xz-utils") | no | yes | | Ubuntu 12.04.5 Precise Pangolin | good (gzip/xz) | http not needed | GNU tar 1.26 | GNU tar 1.26 | yes | yes ("xz-utils") | no | yes | | Ubuntu 14.04.6 Trusty Tahr | good (gzip/xz) | http not needed | GNU tar 1.27.1 | GNU tar 1.27.1 | yes | yes ("xz-utils") | yes | yes | | Fedora 11 | good (gzip) | http not needed | GNU tar 1.22 | GNU tar 1.22 | no | no | yes | no | | Fedora 12 | good (gzip/xz) | http not needed | GNU tar 1.22 | GNU tar 1.22 | yes | yes ("xz") | yes | no | | Red Hat Enterprise Linux 7.7 | no rsync | http not needed | GNU tar 1.26 | no main repo | yes | no main repos | yes | no | | OpenSUSE 11.3 LiveCD | good (gzip/xz) | http not needed | GNU tar 1.23 | GNU tar 1.23 | yes | yes ("xz") | yes | yes | | OpenSUSE Leap 15.1 | good (gzip/xz) | http not needed | GNU tar 1.30 | GNU tar 1.30 | yes | yes ("xz") | yes | yes | | Arch Linux 2019 09 01 iso | good (gzip/xz) | http not needed | GNU tar 1.32 | GNU tar 1.32 | yes | yes ("xz") | yes | yes | | Manjaro Architect 18.04 Stable | good (gzip/xz) | http not needed | GNU tar 1.32 | GNU tar 1.32 | yes | yes ("xz") | yes | yes | | Gentoo 2019 09 01 iso | good (gzip/xz) | http not needed | GNU tar 1.32 | GNU tar 1.32 | yes | yes ("xz-utils") | yes | yes | | Triton SmartOS | https/tar fail | good/workaround | ?????? | ????????????? | yes | presumably | yes | yes | Legend: - "nvh status" means how well the script worked on the live image, or if there was no live image, immediately after installing. - "nvh status http" means how well nvh worked after applying workarounds for broken https. - (Workarounds: use `nvh` with `--insecure`; OR edit `nvh` to have only http URLs; OR use `curl` with `--insecure` or `wget` with `--no-check-certificate`.) \~ Statuses: - "https error" means tools like curl and wget fail to fetch resources from nodejs.org, due to being unable to validate the website's ssl certificates. This is not nvh's fault, and implies a system that would be unable to properly use curl/wget for https transfers. As a workaround, to download nvh from raw.githubusercontent.com (for testing), curl can be used with `--insecure`, `wget` can be used with `--no-check-certificate`; `nvh` itself can be run with `--insecure`. - "http not needed" means that the script worked without using `--insecure`, and without editing the script to use "http" in urls. - "good (gzip)" means that the script successfully uses/falls back to gzip. - "good (xz)" means that the script successfully uses xz if xz support is installed. - "good (gzip/xz)" means that the script successfully uses gzip and xz, depending on which compression schemes are installed and whether an xz tarball is available for the requested node version and the user's platform (this means the script works fully as intended.) Notes on Oses: - \* In Ubuntu Karmic Koala, the "xz-utils" package conflicts with the already-installed "liblzma" package, apparently depended on by "dpkg". Unlikely to be installed for fear of breaking dpkg? But xz-utils is available in the repos, and can work, if one is brave enough to uninstall conflicting packages (and break their system??) - \** In Ubuntu Karmic Koala & Lucid Lynx, wget fails to download files over https. In Karmic, curl also fails to download files over https. In Lucid, curl, which must be manually installed from the repositories (old-releases.ubuntu.com), succeeds at downloading files over https. - \*** In Ubuntu Precise Pangolin, wget fails to authenticate e.g. GitHub https URLs, but succeeds at authenticating Nodejs.org https URLs. For the purposes of the script, this is not a problem (we only need to access nodejs.org anyways.) - RHEL Server 7.7's "minimal" installation does not include rsync, so installing node with nvh fails at the last step (last step is to rsync from the cache folder to the "installed" folders). RHEL 7.7 works with gzip or xz up to the last (rsync) step. There are no packages repositories enabled by default, and no obvious public package repos anywhere (that I could find) for RHEL. So I gave up trying to install rsync. tj/n "develop" branch works fine with xz or gzip. Had to enable networking in RHEL manually after install, otherwise system was unable to reach the outside internet. (RHEL Server 7.7 does not have a command-line interface accessible on the install iso; it has no "LiveCD" environment. ) - Debian Lenny works fine with gzip, but only after manually installing rsync. xz support is not available on Lenny. - SmartOS has its own tar. It either doesn't extract files that are piped in, or I can't figure out the syntax to do so. When nvh is allowed to download the tarball, and then have tar extract that as a local file, it works. See the [man page for tar](https://smartos.org/man/1/tar) at smartos.com. Also, SmartOS has problems with https connections via wget/curl. Can use http as a workaround.

.

None of the operating systems/distros tested were adversely affected by the changes for this issue. (i.e. no OS had less ability to successfully download node with nvh after the changes for this issue, compared to the script as it is on the develop branch.)

Some distros failed to download node with nvh, but this was due to various things unrelated to the changes proposed for this issue. (These issues could be worked around with varying degrees of difficulty.)

For example: RHEL 7 was a bit wonky because it had no rsync installed. (unrelated to xz/this issue).

SmartOS (what Node means when they host tarballs for "sunos") was another case entirely. (Technically not Linux, though.) It rolled its own unique tar that I haven't seen anywhere else. This doesn't really work with piped in tarballs, or else I couldn't figure out the correct syntax to do so. But SmartOS does work with a minimal patch: https://github.com/DeeDeeG/nvh/compare/develop...DeeDeeG:smartos

(Apparently SmartOS isn't intended to be used directly as a general-purpose OS; rather, you are supposed to install VMs and containers on it. But the tarballs for SmartOS are up at nodejs.org/dist, so I thought I'd test anyway.)

Would still want to test this on older macOS, in an ideal world. But I am not exactly sure if I will get the opportunity. I know you can make bootable USB installers for old macOS, so maybe I will try it in Terminal on the bootable installer. (Don't want to actually re-install macOS multiple times.)

Misc note: xz support was added to GNU tar in version 1.22, back in March 2009. Here's the changelog entry.


[Edited 8 September to correct/clarify results and explanatory notes for Karmic Koala.]

DeeDeeG commented 5 years ago

I was able to test on OS X. Here are my results:

Click to expand. | Operating System | nvh status | nvh status http | tar version in fresh install | latest tar in repos | liblzma.dylib present on fresh install | `xz` present in fresh install | xz available in repos | curl present on fresh install | wget present on fresh install | |---------------------------------|----------------|-----------------|------------------|----------------|----------------------------------------|---------------------|-----------------------|-----------------------|-----------------------| | OS X Snow Leopard (10.6) | good (gzip) | http not needed | bsdtar 2.6.2 | no apple repos | no | no | N/A (but yes w/ brew) | yes | no | | OS X Snow Leopard (10.6.8) | good (gzip) | http not needed | bsdtar 2.6.2 | no apple repos | no | no | N/A (but yes w/ brew) | yes | no | | OS X Mountain Lion (10.8.5) | good (gzip) | http not needed | bsdtar 2.8.3 | no apple repos | no | no | N/A (but yes w/ brew) | yes | no | | OS X Mavericks (10.9.5) | good (gzip/xz) | http not needed | bsdtar 2.8.3 | no apple repos | yes | no | N/A (but yes w/ brew) | yes | no | | OS X El Capitan (10.11.6) | good (gzip/xz) | http not needed | bsdtar 2.8.3 | no apple repos | yes | no | N/A (but yes w/ brew) | yes | no | | macOS Mojave (10.14.5) | good (gzip/xz) | http not needed | bsdtar 2.8.3 | no apple repos | yes | no | N/A (but yes w/ brew) | yes | no | Legend: - "liblzma.dylib" refers to the file "/usr/lib/liblzma.dylib" (a symlink to "/usr/lib/liblzma.5.dylib") being present on macOS. This presumably derives from the same "[XZ Utils](https://tukaani.org/xz/)" project as the "xz" executable does on other platforms. Notes: - All testing (for versions less than OS X 10.11 "El Capitan") was done in VirtualBox, on a Mac host running OS X El Capitan. OS Notes: - Mac OS X Snow Leopard was installed from an old installer CD on VirtualBox. Had to modify script to download only x64 images, since my install of OS X ended up being 32-bit. After that, nvh fundamentally worked fine. (Irrespective of 32/64-bit, there was no xz support in this OS's `tar`, so only gzip worked). - Technically, Snow Leopard 10.6.8 was an upgrade over a fresh install of 10.6, via the "Software Update" utility.

.

I found that Mavericks (OS X 10.9.x) can extract xz tarballs piped into tar.

So I believe the platform checks in place for macOS are correct.


[Edited 8 September to clarify that testing was done in VirtualBox, and that nvh was tested after installing to the hard disk, rather than testing "on the iso", or in the Terminal available on the installer image.] [Updated again to add whether /usr/lib/liblzma.dylib was present on the system.] [Edited 12 September to add OS X Mountain Lion]

DeeDeeG commented 5 years ago

I have confidence in these platform checks, but I admit there is increased complexity by trying to check the platform for xz support. It is a low impact problem if it's wrong and off-by-default, but it does become a big deal if it's wrong and on-by-default. Other version managers seem to have decided it was worth it.

I do think it works pretty well, but would like this to have a way to turn it off. A) just so people have control, and B) For the (seemingly unlikely) scenario where the platform checks are wrong. I feel that the environment variable, a command-line switch, or both would be adequate end-user control, which I have already made work in drafts posted to this issue.

DeeDeeG commented 5 years ago

This comment is mostly about whether to do checks that help us when the user has installed a non-platform-default tar... Which it would be pretty easy to decide NOT to do. But if we did want to do that, here's the relevant info. (Also discusses limits of my macOS platform check.)

Click to expand. For what it's worth: Right now I only anticipate the platform checks to be wrong if the user has installed a non-platform-default tar that has no xz support.* I don't know why someone would do that, but... you never know? And the other scenario is if Apple breaks my macOS version check. e.g. If they increment the major version ("10") of macOS, or release/switch to a new platform other than macOS, or stop having the `sw_vers` utility installed... --- \* (We could check for this by making sure Linux users' `tar` is GNU tar 1.22+, and that macOS users' `tar` is bsdtar 2.8.3 or above from libarchive... Unsure if we can check if macOS users have bsdtar linked against liblzma, but we could check if liblzma is present at `/usr/lib/liblzma.dylib`...) Again, I am really unclear why someone would go through the trouble of installing non-platform-standard `tar`, and it feels a lot of extra gymnastics to go out of our way to check for that. But it would be doable. I can picture what the implementation looks like and could draft it up without much difficulty. --- Review of prior art... Do they check which tar is being used? NVS decided to check if local `tar` supports extracting xz. [here](https://github.com/jasongin/nvs/blob/5e8a8ccc8b7b4c8b4c3e1afad7b710afc8755068/nvs.sh#L52-L59) and [here](https://github.com/jasongin/nvs/blob/5e8a8ccc8b7b4c8b4c3e1afad7b710afc8755068/nvs.sh#L165-L189) in nvs.sh (but their check is probably wrong (false positive) for macOS 10.7-10.8; Would need an install disc for OS X Lion and Mountain Lion to confirm.) NVM mostly doesn't do so. see [here](https://github.com/nvm-sh/nvm/blob/0b5bb5ccd875d82e470568b6465d546346e37778/nvm.sh#L3572-L3604). (I note that they only platform check by whether `xz` is installed, which isn't used by tar on macOS, and is unrelated to successfully extracting xz with macOS's default `tar`, which is bsdtar.) They do check if the platform is AIX, and if so they use `gtar` instead of `tar`: [here](https://github.com/nvm-sh/nvm/blob/0b5bb5ccd875d82e470568b6465d546346e37778/nvm.sh#L1743-L1745). Without being able to test on AIX, I can't dispute, nor confirm, whether this is correct. I do not attempt to address use of AIX in my implementations, since I can't test any of it.
shadowspawn commented 5 years ago

Some great research, and I love the Linux and macOS tables of tests.

Good work tracking down the release note for GNU tar adding xz support. Likewise, digging into the nvs and nvm checks.

shadowspawn commented 5 years ago

Not sure if you discovered it, but there is support for trying things across multiple docker containers (currently a hard-coded list in the script).

$ pwd
/Users/john/Documents/Sandpits/nvh/test
$ bin/for-each-container nvh lsr lts
archlinux
v10.16.3

centos
v10.16.3

centos-old
v10.16.3

ubuntu-curl
v10.16.3

ubuntu-old-curl
v10.16.3

ubuntu-wget
v10.16.3

ubuntu-old-wget
v10.16.3

Darwin
v10.16.3
shadowspawn commented 5 years ago

The https errors are presumably due to old certificates in the docker containers. The best fix is to update the certificates, like here:

or as you mentioned, configure curl or wget or nvh to be insecure:

nvh --insecure install lts

or change the protocol to http when defining NVH_NODE_MIRROR (I came across this approach when researching proxy support):

NVH_NODE_MIRROR=http://nodejs.org/dist nvh doctor
shadowspawn commented 5 years ago

Is xz-wip-2 still the best branch to comment on implementation?

DeeDeeG commented 5 years ago

Indeed, xz-wip-2 is the best.

I would also point out that running the xz-wip-debug-messy branch will have it narrate what it "thinks" it is doing, which I have found useful, and so I have actually been testing with that, but end results should be identical between both branches. [Edit to add: I did test the xz-wip-2 branch on some, but not all distros/OSes, just to be sure. It worked the same as debug-messy in those cases.] I made xz-wip-2 by "compiling" down the checks in debug-messy into non-redundant checks, and discarding the echoes. Again, they should pass/reject a system for xz with identical results (given identical inputs).

Last note: I had it not set to xz on by default, just to make sure I was awake and testing with/without xz on all platforms.

DeeDeeG commented 5 years ago

On second thought: If you have interest in a command-line option for xz, such as nvh --use-xz, those are available here, and are just about trivial to implement: https://github.com/DeeDeeG/nvh/compare/xz-environment-variable...DeeDeeG:xz-commandline-argument

They are not in xz-wip-2.


Decided questions:

Other Questions:

Research question:

DeeDeeG commented 5 years ago

Not sure if you discovered it, but there is support for trying things across multiple docker containers

I noticed this after doing much of the testing with LiveCDs in VirtualBox. It looks really convenient. (Probably a good candidate for CI such as Travis CI, if you wanted to run it on all Pull Requests, but I would equally understand leaving it to be run manually. Personal preferences and all.)

I had tried doing run-all-tests, but it didn't work for me at first with docker, so I carried on testing on Linux LiveCDs in VirtualBox.

(I realize now my docker setup requires running with sudo, e.g. sudo docker-compose, so that's why it wasn't working. I can edit the test runner scripts to use sudo docker-compose on my machine, and then it works.)

For the tests themselves, I am happy to report they all passed on xz-wip-2 equivalently to master branch.


Some tests didn't work for me outside of docker, so I wrote up the details of that below (in an expandable section of this comment).

Click to expand bug info 1) The script sets `NVH_PREFIX` as follows: ` export NVH_PREFIX="${TMPDIR}/nvh/test/run-which"` Whereas `$TMPDIR` is turning out to be undefined or empty. Outside of docker, I get errors for making a `/nvh` directory, "permission denied" (sudo required). Understandable, since it (inadvertently) tries to make a directory directly under root (`/`). (All docker commands run as root, so this is fine under docker. It just probably installs things to `/nvh`.) ``` ✗ setupAll for run/which/exec # (2 installs) (from function `setup' in test file tests/run-which.bats, line 13) `nvh install --preserve 4.9.1' failed installing : node/v4.9.1 mkdir : /nvh/test/run-which/nvh/versions/node/v4.9.1 mkdir: cannot create directory ‘/nvh’: Permission denied Error: sudo required (or change ownership, or define NVH_PREFIX) ``` 2) `TMPDIR` is undefined or empty I don't think `TMPDIR` is defined anywhere. When I do `git grep -i TMPDIR` in the root dir of this repo, I just get the one line from `test/tests/run-which.bats`. Perhaps it should be `TMP_PREFIX_DIR` which is used in some of the other test files? 3) Need to run `setup_tmp_prefix` so `TMP_PREFIX_DIR` is defined (and a coresponding temporary directory is created). when I ran `git grep TMP`, it appeared the variable used in other tests is `TMP_PREFIX_DIR`, rather than `TMPDIR`. And this appears to be set in those tests by calling `setup_tmp_prefix`, which seems to be from `test/tests/shared-functions.bash`. So I added that to the `run-which.bats` test. 4) install steps were being (conditionally) skipped ``` if [[ "${BATS_TEST_NUMBER}" -eq 1 ]] ; then # Using --preserve to speed install, as only care about the cached versions. nvh install --preserve 4.9.1 nvh install --preserve lts fi ``` Apparently, and perhaps because this is not the first test file being run, these installs are being conditionally skipped. I commented out the `if` and `fi` to make these installs unconditional.

.

I came up with this patch to make things work outside of docker (Ubuntu Linux in my case): https://github.com/DeeDeeG/nvh/compare/develop...DeeDeeG:test-patch-for-linux-outside-of-docker Unclear if this defeats some optimization for use in docker, but it doesn't break the tests in docker, at least.

DeeDeeG commented 5 years ago

Extra nerdy research for Linux that I refrained from posting thus far (in favor of real-world tests, but this is how I knew which OS versions to test to get interesting results):

DeeDeeG commented 5 years ago

For ease of review, I prepared some updated implementations. They differ in how many ways the end-user can control which compression algorithm is used.

(Edit to add: Commits are separated neatly, so any of these features can easily be included in a final PR. Neither of these two extremes necessarily needs to be the final form of this feature.)

DeeDeeG commented 5 years ago

I was able to test on OS X Mountain Lion (10.8.5). I can confirm it does not support extracting xz tarballs. (This has been added to the results chart above.) That supports a macOS cutoff of 10.9 or above, as is present in my existing implementations for this issue.


For thoroughness' sake: Mountain Lion does not support extracting xz-compressed tarballs locally, via tar xf [tarball].tar.xz, nor with gunzip, in contrast with later releases that generally do support xz decompression in these ways. There is no sneaky or hidden xz/lzma support as far as I could tell.

Further details: OS X Mountain Lion does not have /usr/lib/liblzma.dylib, nor /usr/lib/liblzma.5.dylib. This means that the presence of liblzma.dylib has a 1:1 correlation with successfully extracting xz tarballs with system-native tar on macOS. That could be a more low-level, down-to-the-features test than testing against the macOS version. But this gates the same versions of macOS, (assuming no-one uses non-platform-native tar?) so both approaches are basically equivalent. (Furthermore, using xz is a compile-time feature of bsdtar, so if users were to download liblzma.dylib and plunk it in /usr/lib, system-native tar on macOS still would not be able to extract xz tarballs.) The only way to extract xz with tar on old macOS appears to be to install newer tar, && either liblzma or xz as appropriate.


I suppose we could test for this:

([tar is GNU tar 1.22+] && [system has `xz` on the path]) || ([tar is bsdtar 2.7+] && [the file /usr/lib/liblzma.dylib exists])

But again, I suspect this will gate the same systems, and it should be roughly equivalent. This might make it easier to support platforms other than Linux/macOS/SmartOS/AIX, but the tarballs are only prebuilt for those platforms anyway. So IMO being more obvious that we are targeting certain distros/releases of Linux and macOS makes the code more readable and perhaps more to-the-point. I like a principle of code-what-you-mean, so that the code is more readable, and to discourage expansive scope creep of the given project. Just my personal thoughts on this.

And FWIW, neatly version-testing tar would take several more lines in can_use_xz(), as far as I can picture it.


Edit: here's an implementation with tar version checks instead of OS checks: https://github.com/DeeDeeG/nvh/compare/develop...DeeDeeG:xz-with-tar-version-check

Not really OS-neutral for macOS, since I don't believe other platforms ship libraries as .dylibs. The GNU tar check is, in theory, OS-neutral and generally platform-neutral. But again, GNU tar tends to mean Linux, and Linux tends to mean GNU tar, so...

I do note that the homebrew repos offer both GNU tar and xz, so someone could go out o their way to get GNU tar and xz, and make that the default system tar, but that is about the only scenario where this would help.

shadowspawn commented 5 years ago

(I have been busy on another project. I'll try and take a look at this soon. Thanks for al the investigations!)

shadowspawn commented 5 years ago

TMPDIR. Oops, that is a bug!

shadowspawn commented 5 years ago

Implementation feedback

Yes, also leaning towards auto-detection for xz rather than just manual. Yes, also prefer OS check over tar check.

I think an env override is worthwhile. I don't feel a command line switch is high value but the implementation is tidy, and more findable when hit problems, so ok with keeping that too.

I do not want short flag of -xz. Short flags are technical a single character, and in standard command line processing this expands to -x -z. Like when you do ls -al.

NVH_USE_XZ is currently true/blank/undefined. Rather than blank I would prefer "0" or "false" (pick one, not suggesting support both), and then forcing on would be defined and non-zero or non-false respectively. I do not have a feeling or whether 0 of false is more common. I have slight preference for false because it is more readable, but NO_COLOR and CLICOLOR both use zero, and false is actually a string and not a boolean as such. Perhaps 0 is better? Edit: leaning towards "0" for forcing off rather than "false". (Partly because I was originally confused when working with n code as I assumed false was the command or a built-in rather than just a string!)

Rather than testing for xz url using is_ok, I would prefer to test target node version to avoid the network call. I feel a bit guilty for suggesting performance over robustness, but want to minimise the network calls where reasonable.

Ask questions if any of these seem dubious or unclear, I might have misunderstood. :-)

DeeDeeG commented 5 years ago

Responding to feedback, then will work on an implementation:

Yes, also leaning towards auto-detection for xz rather than just manual. Yes, also prefer OS check over tar check.

I think an env override is worthwhile. I don't feel a command line switch is high value but the implementation is tidy, and more findable when hit problems, so ok with keeping that too.

I do not want short flag of -xz. [ . . . ]

:heavy_check_mark: Agreed on these points. Thank you.

NVH_USE_XZ is currently true/blank/undefined. Rather than blank I would prefer "0" or "false" [ . . . ]

Okay. This is certainly doable, if a little less compact/neat in the setup phase. Will try a proof-of-concept for this in my next iteration. Wrote up a bunch below on why alternatives are more complicated than parameter expansion, but this is all theory, and it's probably not a huge deal in practice. (I think we need to use if AND test, to check truthiness and non-null, respectively.)

Lots of thoughts about truthiness/non-null in bash, and what to do about it in nvh (click to expand)
Aside (some background info) about test and [ (click to expand) `test`, aka `[` is a little weird. (`test`, aka `[`, is an actual command, not a bash built-in. See `man [` and `man test` which are both aliases for each-other, and share a man entry.) Perhaps un-intuitively, the command `[ -z "string" ]` (or `test -z "string"`) only tests for zero-_length_ of a string. It does not actually test for logical zero, nor logical truthiness/falsiness in general. Why? Because `test` never checks truthiness in the expected way; it treats all input as text strings. The folloing are all non-null, (non-empty strings), so `test` evaluates them to true: - [ true ] - [ false ] - [ 0 ] - [ null ] The only thing `test` evaluates to "false" is the empty argument: - `[ ]`
. Other than parameter expansion without the colon, e.g. `"${var-true}"`, I don't know of a way to check (with a single command/builtin) for three states of a single variable (unset,set-null,set-nonempty) in bash. We can chain `if` and `test`, though. We can do `if "${var}"` to evaluate truthiness/falsiness. So we could do `if ! "${var}" && [ -n "${var}" ]` to check logical falsiness (evaluates `0` as false) and non-zero length (distinguishes between null and the string "0").
Aside: why recognizing three states is important For this variable, we do need to preserve the three states. - "set and something falsy (currently null)" is requesting xz off. - "set and true-ish" (currently just any non-blank string) is requesting it on (pending platform checks). - "unset" means we register no preference from the user. IMO having three states is important, in case someone modifies the script to have xz off by default. The environment variable and/or command-line switch should still both do what they say on the tin. (If this is not important, we can simplify, by setting xz on by default, and no-op for requests to "turn on xz").
. I find the parameter expansion that handles "set-null/set-true/unset" is the most compact way to present these three options to the later parts of the script. This is also backward-compatible with the existing implementation with `n`. The `n` implementation is currently like this: - empty or unset is requesting xz off. (However, xz is off by default, so the brief "set xz" code paths basically no-op in this case, leaving the default gzip implementation to do its thing. We can back-port "0 means force off" to n if you like.) - "set and true-ish" (just checks that `N_USE_XZ` is non-blank) is requesting/forcing it on. So yeah, IMO as long as n and nvh eventually match, I am on board. I will see how tidy I can make a "unset/0/true" branching logic, rather than "unset/set-null/set-true".

.

Rather than testing for xz url using is_ok, I would prefer to test target node version to avoid the network call. I feel a bit guilty for suggesting performance over robustness, but want to minimize the network calls where reasonable.

This is a subjective one and I leave it as your call in the end. Cost-benefitting this out, from my perspective:

I would propose this as a potential solution:

This proposal doesn't sacrifice any robustness, and moves the dual network request onto a rarer code path. (Still easy to test, e.g. by attempting to download an older version of node, such as 0.8.)

Related, but a mild tangent: If version checking rather than using is_ok for the xz tarball URL, I would like to version check for version "3" and up, since io.js had xz on both Linux and macOS by that version. (io.js forked node.js and immediately incremented the version to 1.0, then node.js took back over and immediately incremented the version to 4.0, so the version scheme and development of features are relatively continuous and unidirectional, despite the forking.)

DeeDeeG commented 5 years ago

Short summary: For determining boolean truth from a variable's value, bash can only distinguish between "unset, set-but-null, and set-and-not-null". Beyond that, we have the exit status of commands, or string comparison. That's about it. Handling "0" or "false" as environment variable values would depend on string comparison.


I am shocked to find there is no obvious way of evaluating boolean "truth" in a traditional programming-language-analogous way in bash. Only whether a variable is unset, set but empty/null (zero-length), or set and non-empty (non-zero-length). And the only way to test for an unset variable involves parameter expansion of the form: ${}.

I was specifically mistaken about using if to evaluate truth values; I had been setting NVH_USE_XZ to "true" or "false" or "0", leading to this: if "${NVH_USE_XZ}" (which evaluates NVH_USE_XZ, runs its value as a command (!!!) and checks the exit status.)

This is clearly bad, since we will run any command contained in the value of NVH_USE_XZ from the environment. Basically, arbitrary code execution.

See this for the full extent of the shenanigans that is "true" and "false" in bash: https://superuser.com/questions/1400335/how-does-bash-test-false


I have opted instead to do a case construction in my first-round-of-feedback, proof-of-concept implementation.

# Use xz compression by default for tarball downloads. (If NVH_USE_XZ is unset, set it to "true".)
# Based on this: https://stackoverflow.com/a/13864829 
[ -z "${NVH_USE_XZ+x}" ] && NVH_USE_XZ="true"

# Sanitize "false", "0" and "" to the empty string. Sanitize other values to "true".
case "${NVH_USE_XZ}" in
  "0" | "false" | "")
    NVH_USE_XZ=""
    ;;
  *)
    NVH_USE_XZ="true"
esac

This is the only reasonable way to handle "false" and "0" in my opinion, since otherwise we have to do string comparisons for those values throughout the script. It makes sense to normalize to something, and normalizing all "false" values to the empty string means we can do [ -n "${NVH_USE_XZ}" ], rather than full-fat string comparisons.

(The other option I can think of (see initial implementation) is to "not specially handle" false and 0, and treat them just as non-empty strings, i.e. normalize them to true. Then we would have to advise users that setting NVH_USE_XZ to be non-empty means "use xz", setting it empty means "don't use xz." and if they don't set NVH_USE_XZ, we just do the default.)

DeeDeeG commented 5 years ago

I made some updated implementations after considering the first round of feedback. They differ on how to update the "does an xz tarball exist?" check. (Various additional decisions were made to respond to feedback, as laid out in the two preceding comments.)

Open to more feedback, especially based on these or the previous pr-candidate[. . .] branches.

shadowspawn commented 5 years ago

https://github.com/shadowspawn/nvh/issues/8#issuecomment-532351953

I went through similar boolean learning (and surprise), and decided to normalise to "true"/"false" for easy reading. Full-fat string comparisons. 😄

Have a look at USE_COLOR and install_writeable and GET_OPTIONS_SHOW_PROGRESS

I'll have a look at the updated implementations.

DeeDeeG commented 5 years ago

Example implementation with true/false for NVH_USE_XZ: https://github.com/DeeDeeG/nvh/compare/develop...DeeDeeG:xz-pr-candidate-with-feedback-true-false


I'm inclined to go with "true/false" for this feature at the moment, if only to match the rest of the script's internals, but I suppose "1/0" would also be okay. (The internals are a code style/developer consideration, but what we expect users to set their environment variables to is also a UI/UX consideration.)

Might not need the case for normalizing input now, since we can tell users to set the variable to "true/false" (or "1/0"), and other values would simply not be supported. (This would be different from the way it is in tj/n now, though, where "null" is implied to be meaningful. Maybe the existing implementation there should be deprecated/replaced with one that matches what is decided on here? I doubt if anyone has started using it there yet?)


Aside about performance of "full-fat string comparisons" (click to expand) As for "full-fat string comparisons"... For some reason it's been drilled into my head that "string comparisons are bad" because they're supposedly "very slow." I tried changing `NVH_USE_XZ` to be "true/false" or "1/0" and it didn't seem slower to me than "empty/not-empty", really. Maybe it's slower if you do it thousands of times in a loop, but this isn't thousands of times in a loop. ~~And I haven't even tested that.~~ Just tested it, and 10,000 full string comparisons (`[ "${var}" = "true" ]`) and 10000 string-non-null checks (`[ -n "${var}" ]`) both take about 200 milliseconds, give or take 20 milliseconds. integer comparison checking the variable against "0" seemed consistently a bit slower??? But even so, performance impact is quite negligible. Conclusion: Seems like the people who said string comparisons are slow were wrong. At least for bash. And I suspect the advice is outdated, as the CPUs have gone from MHz to GHz... A few more instructions doesn't mean a lot. (And it's not like we're compiling down to assembly/machine code where these optimizations are more likely to manifest, I guess? Not a big compiled language person at the moment.)
shadowspawn commented 5 years ago

I had something like this in mind. Note, this code fragment is not passing in the target version to can_use_xz as I wrote it looking at one of your branches without a version check!

# User may set NVH_USE_XZ to 0 to disable, or set to anything else to enable, or undefined for automatic.
# Normalise to true/false.
if [[ "${NVH_USE_XZ}" = "0" ]]; then
  NVH_USE_XZ="false"
elif [[ -n "${NVH_USE_XZ+defined}" ]]; then
  NVH_USE_XZ="true"
elif can_use_xz(); then
  NVH_USE_XZ="true"
else
  NVH_USE_XZ="false"
fi

# May be overridden by command line flags.
shadowspawn commented 5 years ago

Style: I am using [[ x ]] rather than [ x ] or test x, per:

https://google.github.io/styleguide/shell.xml?showone=Test,_%5B_and_%5B%5B#Test,_%5B_and_%5B%5B

shadowspawn commented 5 years ago

I think you have all the pieces I would want, scattered among the branches! Should I try and sort out the pieces so you can put together a PR?

DeeDeeG commented 5 years ago

Sure, that sounds good. It's getting into the smaller details now where I think a PR makes sense.

DeeDeeG commented 5 years ago

Re: this comment (which suggests using a single if-else[...] statement to handle NVH_USE_XZ, with "xz off, xz on, and auto xz" as the three code paths)... Looks pretty reasonable.


Unclear if the "xz on" code path should "force" xz on, i.e. skip the minimum node version check (or xz URL is_ok check). (As opposed to the "Auto" code path, which could check more rigorously that xz works before settling on using xz.)

Pros:

Con:

Related: If we don't distinguish between "forced on" and "auto on" later in the script, the if-else statement could potentially be simplified a bit. (But I suppose it is a little less fork-friendly to remove the distinction, since that would mean it's a tiny bit more work to turn xz off by default in a fork. (Is fork-friendliness a priority? Do people fork nvh/n often?))

DeeDeeG commented 5 years ago

Re: https://github.com/shadowspawn/nvh/issues/8#issuecomment-533034644 again: I just noticed we can't use the function can_use_xz() in the setup phase, at least without refactoring, because the setup phase precedes all the function definitions.

Working on putting together a PR.

shadowspawn commented 5 years ago

Unclear if the "xz on" code path should "force" xz on, i.e. skip the minimum node version check

Interesting question, and I have been thinking about that. I started out thinking that command line should override all checks, but have changed my mind.

I think it is reasonable that the version check is independent of the auto check. The auto check is looking at the system support for xz, and can be overridden by user due to preference or incorrect or incomplete detection. The version check is a limitation in availability of that archive format. (While in theory there could be mirrors with different combinations of archive formats, only worth revisiting when proves an actual use case.)

shadowspawn commented 5 years ago

Makes nvh easier to use with forks/mirrors of node with custom version schemes?

Not a consideration at this time.

On a related note, there was some support in n for using with other "products" by defining environment variables. I never found an actual use of this, and removed the support in nvh early on and now gone from n too.

shadowspawn commented 5 years ago

(Is fork-friendliness a priority? Do people fork nvh/n often?)

Simple answer is no. I don't see issues reported from people having issues integrating with forks, including from the recent major changes.

Focus on making the code simple and robust for our use case. 😄

DeeDeeG commented 5 years ago

On a related note, there was some support in n for using with other "products" by defining environment variables. I never found an actual use of this, and removed the support in nvh early on and now gone from n too.

In other words, trying to get io.js doesn't really work anymore? (because the "product name" would be "iojs" rather than "node" or something like that?)

I am trying setting iojs.org as the custom mirror at the moment, but it fails with this: Error: download preflight failed for v2.5.0 (https://iojs.org/dist/v2.5.0/node-v2.5.0-linux-x64.tar.gz)

whereas the proper url would have v[num]/iojs-v[num]-[os]-[arch].[ext]

shadowspawn commented 5 years ago

A musing, rather than a suggestion. If the internal and external formats for a variable are different, then may be appropriate to use a different name. For example if externally NVH_USE_XZ is undefined/0/defined and internal we want a variable which is true/false, then might be cleaner to use different variable name.

It is convenient to reuse when supplying a default and doing minor normalising, like with NVH_NODE_MIRROR.

shadowspawn commented 5 years ago

In other words, trying to get io.js doesn't really work anymore? (because the "product name" would be "iojs" rather than "node" or something like that?)

It was more a general comment that io in particular. There used to be explicit support for io with variations on a number of the routines, separate from the generic "product" support. The io support got removed too as extra complication and obsolete.

shadowspawn commented 4 years ago

Preference for xz downloads, if detected, released in nvh 9.0.0

Thank you for your contributions.