Checksum, and multiple download locations, general download reliability.

willstott101 commented 4 years ago

In the name of reliability and security it should be possible to have an integrity checksum with each URL in the JSON file.

It should also be possible to list multiple sources for the data. For instance, github went down for a few hours today breaking our builds because of this package. Having alternative sources would lessen this problem.

Finally I think the primary source of the data that this package uses should be on npm anyway, in many cases people use the default json file. That could just be one large package on npm, allowing local enterprise npm caches to cache the data.

dpolivy commented 4 years ago

I generally agree with this, and actually have spent the past few days looking at alternate solutions here. For me, the issue is that I want the "full" set without having to specify an environment variable or package.json property (which doesn't work with some monorepo build systems). The individual CLDR datasets are already available as npm modules already:

https://github.com/unicode-cldr/cldr-json

The nice thing about this package is the 'glue' in index.js that makes loading the various pieces a bit easier. And, many of the globalize modules have peerDependencies on cldr-data, which means you have to use this module (though maybe the peerDependency requirement needs to be revisited).

Anyway, I don't have a particular ideal solution yet, but happy to chat more about building one.

willstott101 commented 4 years ago

Perhaps if this package looks for and globalises the separate npm packages and avoids downloading specifically those. Then a meta-package which depends on this and the data packages could be made. It feels a bit like a minimal-changes-hack but it's the first solution I see

rxaviers commented 4 years ago

I am open to revisit globalize packages peer dep on cldr-data

dpolivy commented 4 years ago

@rxaviers I've been looking at this and think I have a decent solution (at least for my own purposes, at the moment) which is based on the official Unicode CLDR npm modules. Is there any thought about taking the "glue" from cldr-data (e.g. index.js) and rewriting it to basically make it easier to load the individual CLDR modules based on what's installed? So, perhaps, creating a cldr-loader package which doesn't contain any data, but just makes it easier to work with the https://github.com/unicode-cldr/* packages. The app/user of the package would specify the desired cldr modules they need in their own package.json, along with cldr-loader, and then instead of doing a require('cldr-core/...') they would just use the cldr-loader package to make it easy (e.g. with entireMainFor(...), etc). I feel like something like this would be much preferred to the current approach, as it gives folks an easy way to pick and choose the data via package.json instead of URLs, and leaves all the heavy network transfers to tried-and-true mechanisms.

Then, the next step would be to adjust the globalize-* packages to not require cldr-data, and/or use the new cldr-loader helper instead. Unfortunately I don't know that a truly backwards compatible structure could be accomplished to make this more of a seamless transition.

Thoughts?

rxaviers commented 4 years ago

I liked it :+1:. In that line of thought:

cldr-loader could borrow most of cldr-data index.js code.
cldr-data could use cldr-loader (i.e., most of its could be moved)
Perhaps both packages could live in this same repo, using https://github.com/lerna/lerna to manage publishes

An unrelated change, but probably something we could do as well is either of:

Fix cldr-data downloads on Node>6 by doing this https://github.com/rxaviers/cldr-data-npm/issues/28
Or change cldr-data behavior from downloading CLDR data (using its custom downloader) to simply depend on the official CLDR ones (and use the full coverage) <-- this is a breaking change though

dpolivy commented 4 years ago

Personally, I'd vote for depending on the official packages, assuming folks can manage the breaking change. Depending on how they use cldr-data, the change could be very easy if they are only using the syntactic helpers (.all(), entireSupplemental(), etc...) -- that would be compatible as-is. If they are digging deeper and loading individual CLDR files via path (e.g. require('cldr-data/supplemental/likelySubtags.json')) then it would be a breaking change. I'm not sure how to version that appropriately with the existing cldr-data package, though, since it doesn't seem to totally fall into semver semantics.

Going with the official packages eliminates unnecessary dependencies and automatically fixes the other issues you mention.

I have a little time I could spend on this right now if we can agree on a direction. Can chat more in Slack, too?

rxaviers / cldr-data-npm

Checksum, and multiple download locations, general download reliability. #68