Closed Manishearth closed 1 year ago
cached_path
has quite a lot of dependencies for how we're using it, we could do this ourselves with just reqwest
.
The zip dep can be dropped if you're providing unzipped paths
We'd have to encapsulate this on the API somehow, i.e. expose a with_cldr_dir
and with_cldr_zip
which would be cfg
-ed. This isn't great ergonomically. Is the zip crate something that's hard to vendor/not generally available?
cached_path
has quite a lot of dependencies for how we're using it, we could do this ourselves with justreqwest
.
Yeah but I don't want to pull in reqwest
either.
A lot of build systems have a "no hitting the network" rule for build time stuff (both Firefox and Google3 do this).
We'd have to encapsulate this on the API somehow, i.e. expose a
with_cldr_dir
andwith_cldr_zip
which would becfg
-ed. This isn't great ergonomically. Is the zip crate something that's hard to vendor/not generally available?
Eh, I don't feel strongly about dropping zip
, this is just something that can be explored. We could also just detect if it's a zip or a directory with the current flag.
"generally available" isn't as much an issue as "do i want to have to deal with version bumps and stuff here", Google3 already has zip
vendored but if we can avoid the dep I'd like to
Yeah but I don't want to pull in reqwest either.
reqwest
is a fairly common crate and more likely to be vendored already, so I think it's still useful to move from cached_path
to reqwest
.
A lot of build systems have a "no hitting the network" rule for build time stuff (both Firefox and Google3 do this).
Well, we can build with reqwest
but not hit the network, or we can hit the network with just std::net
, so I think this argument is orthogonal.
We could also just detect if it's a zip or a directory with the current flag.
I don't like panic!("This feature needs to be enabled")
Every avoided dep is an additional feature for the matrix...
reqwest
is a fairly common crate and more likely to be vendored already, so I think it's still useful to move fromcached_path
toreqwest
.
Not in Google3. And it pulls in the entire hyper/http cinematic universe.
Actually most large hybrid projects I've seen don't use the Rust networking stack because they have their own and they want everything to go through one place.
I don't like
panic!("This feature needs to be enabled")
I think this is pretty constraining; I'm fine with that never happening in the default build but I think it's an important tool for the lean build.
TBH I'd also like to reintroduce the ability to disable experimental components.
Just want to push back on the reqwest
part of this conversation: we moved away from reqwest
because it didn't give a good user experience. cached_path
is great because it gives us progress bars, but unfortunately we disabled those last month because of a change that started doing downloads on threads, so the UI was getting screwed up. The other thing cached_path
gives us is an easy way to make sure the file is downloaded exactly once, refreshed when necessary, and saved in a safe cache directory. This is all logic we can write by hand, much of which we had in the reqwest
days. So while I'm not opposed to restoring reqwest
, we should think about where we went wrong in our judgement of cached_path
being a good successor. In other words, let's discuss and get alignment before making a PR changing how we download zip files.
Regarding this:
I don't like
panic!("This feature needs to be enabled")
As mentioned in the other thread, I see this as being more of "making an optional datagen flag be required". In this case, we're making --cldr-root
and --icuexport-root
be required. I think that's totally fine. We could even modify the clap
argument parsing to reflect this if we think it would improve DX.
That won't work for zip/no zip as those flags both accept zip files
We can have those flags always be capable of accepting a directory, and change the help text if zip
is disabled.
Discussed with @Manishearth @sffc @robertbastian:
zip
dependency is fairly light; we could make it a feature but the case is not as strongHere's the full list of new deps after we remove wasmer and cached-path, followed by the cargo tree
output
adler, autocfg, bincode, block-buffer, byteorder, bzip2, bzip2-sys, cc, cfg-if, chrono, cobs, cpufeatures, crc32fast, crlify, crossbeam-channel, crossbeam-deque, crossbeam-epoch, crossbeam-utils, crypto-common, databake, databake-derive, digest, elsa, erased-serde, flate2, generic-array, hash32, heapless, iana-time-zone, icu_codepointtrie_builder, icu_compactdecimal, icu_datagen, icu_provider_blob, icu_provider_fs, itertools, itoa, libc, libm, lock_api, log, matrixmultiply, memoffset, miniz_oxide, ndarray, num-complex, num-integer, num-traits, num_cpus, pkg-config, postcard, rawpointer, rayon, rayon-core, regex-syntax, rust-format, rustc_version, ryu, scopeguard, semver, serde-aux, serde-json-core, serde_json, sha2, spin, thiserror, thiserror-impl, time, toml, typenum, version_check, zip,
Some thoughts:
zip
is pulling in a lot of stuff. We might benefit from disabling features.rayon
is also doing a lot.clap
is doing a lotsha2
is doing a lot, which is kinda annoying since it's just a hash function. We could use some builtin hash function or just copy an impl of a simpler oneI definitely want to keep clap
, for rayon
and zip
we probably could remove them but I'm not dead set on it since they're realtively common.
sha2
is only used for fingerprinting, we could use a lighter hash. We could also move the fingerprinting feature out of datagen into testdatagen.
Yep
It looks like the zip
features control which shapes of ZIP file we support parsing. Since we control the generation of the zip files we need to parse, we could figure out which features we need for those specific zip files and disable the rest
zip is also a transitive dep for cached_path
, which we don't use but cannot be disabled.
Once #3146 lands I'll probably make a combined PR removing the mandatory cached-path and wasmer deps from datagen, and some other low hanging fruit if I find it.
So far we've not been careful about datagen deps. However, to be able to include datagen in a vendored tree, I'd like to have options for minimizing the dependencies.
A couple things that can be optimized:
list_to_ucptrie
binary (useful if you already have ICU4C): #3122