Closed srl295 closed 3 years ago
okey dokey I'm not sure where all the semicolons went, and it's too noisy, but it seems to work.
Set env var FULL_ICU_PREFER_NPM=1
to stick with the current behavior
force push to fix the logic for little endian folks
hey @srl295 is there a reason for this switch? Mostly asking out of curiosity 😄
hey @srl295 is there a reason for this switch? Mostly asking out of curiosity 😄
Great question, the current process using NPM requires someone, me, to do an upload process every time ICU releases.
But starting in ICU 67, That data file is already a part of the normal release files that are stored on GitHub by icu. 
Even though the official binaries now build full-icu by default it's still possible to build non-default builds where being able to install full-icu would be useful. For example we have had queries about the full-icu npm not working with RHEL builds of Node.js, which build with small-icu and provide a separate rpm for full-icu data (i.e. the answer in that case is to install the additional rpm but if this PR was merged installing full-icu data via npm would be an option again.
FWIW I was experimenting with GitHub Actions and UBI containers/RHEL Node.js: https://github.com/richardlau/full-icu-npm/pull/1/ -- You can see an example failure with RHEL Node.js 14 (full ICU data for ICU 69 not found): https://github.com/richardlau/full-icu-npm/pull/1/checks?check_run_id=3340466514.
@richardlau
full-icu npm not working with RHEL builds of Node.js
yes, because I neglected to do the ICU v69 publication to npm so far. And v70 is almost out. You can cross ref those issues here.
I'll add your mentioned containers to my #66
Ouch! And I named the files wrongly on the ICU side. Filed https://unicode-org.atlassian.net/browse/ICU-21764 to track, but i can work around in the code.
If the ICU components were their own modules (which I think might already be the case) the entire process of creating a GitHub Release -> Publishing to npm can be audited end-to-end with actions and happy to walk you through setting it up.
My biggest concern with switching to GitHub is that Releases themselves are mutable and afaict the shasums of assets will not be checked which increases the risk for a supply chain / MITM attack.
If the ICU components were their own modules (which I think might already be the case) the entire process of creating a GitHub Release -> Publishing to npm can be audited end-to-end with actions and happy to walk you through setting it up.
I'm not quite sure what you mean…
When ICU publishes, for example, v69.1 its process outputs: https://github.com/unicode-org/icu/releases/tag/release-69-1 including SHASUMS and signed files. (PTAL) The keys are here https://github.com/unicode-org/icu/blob/main/KEYS
If you mean that ICU's repo could have a github action to publish to npm, yes, theoretically, however ICU had declined to keep doing this. (part of why this module transferred here) Publishing the data file, which is more broadly used, was a sort of compromise for that.
My biggest concern with switching to GitHub is that Releases themselves are mutable and afaict the shasums of assets will not be checked which increases the risk for a supply chain / MITM attack.
So previously the full-icu loader fetched symbolic tag icu4c-data@69l
on NPM, which is itself mutable. And funny thing in 2021 NPM and GitHub are kind of related :) I hear the concern, in some sense this reduces the surface because there's one fewer less-maintained resource (the icu4c npm resouces) in the mix. Maybe I should say, this provides far fewer moving parts than before, and they are better understood (the ICU release process) vs obscure https://github.com/nodejs/icu4c-data-npm
I suppose the KEYS file could be a part of full-icu and it could check the GPG signature of the ICU file (the KEYS don't change very often at all and could be baked in to full-icu, I don't know how expensive/platform friendly such a GPG check is).
@MylesBorins the other reason for doing this is from the npm (and yarn) side, #38 - doing an npm install of a dynamic dependency has been frowned on from the beginning and for good reason.
@MylesBorins full-icu could have a baked in set of hashes for the end data files, that would require someone to rev full-icu every time a new ICU comes out (abt every 9 months). Again, that adds maint.
I would say I'm trying to work myself out of a job, but that kind of took care of itself last year 8-D
in https://github.com/yarnpkg/berry/issues/873#issuecomment-591346241 @arcanis said:
… running yarn add while an install is already in progress is completely invalid as it would likely lead to dependency tree corruptions
I'm going to take that as a vote against using npm|yarn add
any other comments from @nodejs/npm ? like to make sure i'm fixing a real problem, and that the fix is better than the problem, and that i've well documented why a change is being made.
post install considered harmful by yarn: https://yarnpkg.com/advanced/lifecycle-scripts#a-note-about-postinstall
@srl295 I guess I'm not entirely understanding the process of what is happening with these changes, and how it is not dynamic.
If this is an incremental improvement and makes things more reliable it probably makes sense to move forward with it and perhaps explore ways to improve it moving forward. Happy to find time for a quick call if you want to talk through it, feel free to dm me to find time.
I don't have much context on this particular package or change, but as a general tip I got feedback in the past that serving assets directly from GitHub was problematic for users behind certain firewalls (particularly China), cf https://github.com/yarnpkg/berry/issues/982.
any other comments from @nodejs/npm ? like to make sure i'm fixing a real problem, and that the fix is better than the problem, and that i've well documented why a change is being made.
I'm not sure what the problem here is tbh. Is it the postinstall approach or is it "where should we source this content?"
If the problem being solve here is "installing platform-specific assets" then this is the same problem that chromedriver
also solves using a manual download https://github.com/giggio/node-chromedriver/blob/main/install.js
The only fundamental difference is that they hook into the install
event. It does not appear to make a difference in either implementation which event would want to be used, postinstall
is simply guaranteed to run after the install
event and neither project has both types of scripts.
If the problem being solved here is "where do we source the assets from" then we do lose the baked in idempotency we get from the npm registry and also the checksum validation that an npm or yarn install would give. We also, as @arcanis pointed out, lose some availability from behind certain firewalls. Checksums are already baked into the npm registry:
$ npm view icu4c-data@67l dist
{
integrity: 'sha512-OIRiop+k1IVf4TBLEOj910duoO9NKwtJLwp++qWT6KT5gRziHNt+5gwhcGuTqRy++RTK2gLoAIbk8KYCNxW++g==',
shasum: '4c5264a9ec6e2d126b8d47b6faa618025ccbaaaf',
tarball: 'https://registry.npmjs.org/icu4c-data/-/icu4c-data-0.67.2.tgz',
and I would recommend using those since that's already what will happen when you use the old npm method. This way there is a "guarantee" that there is no difference between the results based on if you use npm install
or manually download. You'll source it from the same place.
any other comments from @nodejs/npm ? like to make sure i'm fixing a real problem, and that the fix is better than the problem, and that i've well documented why a change is being made.
I'm not sure what the problem here is tbh. Is it the postinstall approach or is it "where should we source this content?"
Both are problematic. #38 summarizes it: "you can't npm/yarn install (a random asset) from within a postinstall script". Both npm and yarn folks have confirmed this.
npm seemed like one possible place to store the data due to the caching and availability mentioned.
If the problem being solve here is "installing platform-specific assets" then this is the same problem that
chromedriver
also solves using a manual downloadhttps://github.com/giggio/node-chromedriver/blob/main/install.js
The only fundamental difference is that they hook into theinstall
event. It does not appear to make a difference in either implementation which event would want to be used,postinstall
is simply guaranteed to run after theinstall
event and neither project has both types of scripts.
It's "ICU version and endianness specific". So not quite platform specific. The ICU version changes once or twice a year.
The basic logic is:
!process.config.variables.icu_small
process.versions.icu.split('.')[0]
or process.config.variables.icu_ver_major
l
or b
? (e
for EBCDIC would be supported, but doesn't occur in practice) process.config.variables.icu_endianness
Putting it all together, then it attempts to run npm install icu4c-data@67l
We don't know a priori what ICU version will be used. When node was installed, the user can do configure --with-icu-source=something
and it could be any version.
If the problem being solved here is "where do we source the assets from" then we do lose the baked in idempotency we get from the npm registry and also the checksum validation that an npm or yarn install would give. We also, as @arcanis pointed out, lose some availability from behind certain firewalls. Checksums are already baked into the npm registry:
$ npm view icu4c-data@67l dist { integrity: 'sha512-OIRiop+k1IVf4TBLEOj910duoO9NKwtJLwp++qWT6KT5gRziHNt+5gwhcGuTqRy++RTK2gLoAIbk8KYCNxW++g==', shasum: '4c5264a9ec6e2d126b8d47b6faa618025ccbaaaf', tarball: 'https://registry.npmjs.org/icu4c-data/-/icu4c-data-0.67.2.tgz',
and I would recommend using those since that's already what will happen when you use the old npm method. This way there is a "guarantee" that there is no difference between the results based on if you use
npm install
or manually download. You'll source it from the same place.
There's no way to know what version is being used as mentioned. As I mentioned above, the files at https://github.com/unicode-org/icu/releases/tag/release-69-1 are GPG signed, so we could have some way to validate those.
cool… https://github.com/unicode-org/icu/commit/4a8b16056f157cebc0f42350e0f6b3b6360ae06c just barely made it into icu4c 70 rc … this means that workaround won't be needed
@wraithgar @MylesBorins think this is ready to merge… perhaps open new issues to track further work?
https://github.com/nodejs/full-icu-npm/pull/53/checks?check_run_id=3758743393#step:4:25
Verifies that install-from-npm stays working
Hi all,
I've done an early publish to npm
so folks can test this out:
npm i full-icu@1.4.0-0
( https://www.npmjs.com/package/full-icu/v/1.4.0-0 )
feat: Pull from GitHub instead of npm
Fixes: #36 Fixes: https://github.com/nodejs/full-icu-npm/issues/61 Fixes: https://github.com/nodejs/full-icu-npm/issues/38