Closed bheylin closed 11 months ago
build.rs
files can be seen as suspicious. A build.rs file executing on my local machine may be outside of the risks I'm willing to take
Maybe playing devil's advocate (I'm not familiar with xtask) but how is that different from running the task from a risk perspective?
build.rs
files can be seen as suspicious. A build.rs file executing on my local machine may be outside of the risks I'm willing to takeMaybe playing devil's advocate (I'm not familiar with xtask) but how is that different from running the task from a risk perspective?
Fair question. From a risk perspective...
build.rs
is run automatically at least once on every machine that downloads the phonenumber
package. xtask
is run manually by one of the project maintainers when the assets/PhoneNumberMetadata.xml
is updated. The bincode generated by the xtask
would then be checked in to source control. As an added bonus the commit record shows when the assets/PhoneNumberMetadata.xml
was updated and when the bincode
was updated. I think I disagree with your security model. If you mistrust a rather simple and straightforward build.rs
file supplied by the rust-phonenumber
developers, then I would also mistrust a checked in binary file that is manually updated (although verifiable), and I would also mistrust anything that remotely touched proc-macros.
I really don't think the benefit in terms of your security model and build time outweighs the mental overhead required for us to maintain an xtask
package and to run the xtask
whenever the xml
changes. If I need to weigh "manually running and maintaining a separate package for the sake of loading a file" against "38 lines of build.rs", I think I know which one wins.
That said, I agree with the sentiment that a bincode-formatted file that is used for nearly all users is not optimal, which you report in #63. I'll comment on that tomorrow!
Hi @rubdos
I get that the build.rs
reduces the maintenance work, and I take what you say about this seriously.
I appreciate the work you've put into developing this library and my aim is not to trivialize your time
or make your life harder :smile_cat:
My issue with the build.rs
is more that it's work that's needlessly happening on my machine,
and to do that work, it pulls in dependencies that are not critical to the domain problem, parsing a phone-number.
We can talk back and forth about security, but that's more a symptom of the issue. My aim here is to only reference the critical dependencies in the published lib.
This ties in with issue #64 too. As depending on the outcome of that conversation, maybe we can shift the dependencies needed for creating the database into another package. We certainly don't require quick_xml
to parse a phonenumber once the database exists. :smile_cat:
In short, I would like to split the existing lib code into two parts;
Let's withhold talking about the implementation details of the data processing task
and instead imagine it as a task much like dependabot
.
That's automatically triggered when the source xml
is out of date.
It makes a PR and generates a bincode
file for you to review and merge.
In that case I think your maintenance burden is not drastically increased.
rust-phonenumber
package can be stripped down to it's simplest form.What do you think?
My issue with the
build.rs
is more that it's work that's needlessly happening on my machine, and to do that work, it pulls in dependencies that are not critical to the domain problem, parsing a phone-number.
Can't that be fixed by outputting cargo:rerun-if-changed=assets/PhoneNumberMetadata.xml
and splitting out the meta-data loader to be only part of the build stage?
My issue with the
build.rs
is more that it's work that's needlessly happening on my machine, and to do that work, it pulls in dependencies that are not critical to the domain problem, parsing a phone-number.Can't that be fixed by outputting
cargo:rerun-if-changed=assets/PhoneNumberMetadata.xml
and splitting out the meta-data loader to be only part of the build stage?
From what I understand about build.rs
, it will always be run once wherever it's being built. Using cargo:rerun-if-changed
will trigger if to be run again if the given file changes.
I just noticed the assets/upgrade.sh
script. I presume one of the maintainers is running this every few months.
Is this not the perfect place to trigger the generation of the database?
My issue with the
build.rs
is more that it's work that's needlessly happening on my machine, and to do that work, it pulls in dependencies that are not critical to the domain problem, parsing a phone-number.
I still find your argumentation ill-posed. Compilation of the library itself is also work that's needlessly happening on your machine. Very many libraries have build.rs
scripts.
I still feel like we have an x-y problem here. Do you take issue with the build time that the library poses? Because that, in my eyes, would be a rather good argument to look closely at how we can further optimize the build process. If you indeed take issue with the build time, then I would also like to see numbers and a benchmark, such that we can track the build script performance over time.
As an aside: after #63, I imagine we get rid of the bincode
outputted file altogether. Maybe the database can even be put behind a feature flag.
We certainly don't require
quick_xml
to parse a phonenumber once the database exists. 😸
I did not know we still depended on quick_xml
at runtime, that's indeed something to look at. That too could become a feature flag.
Can't that be fixed by outputting
cargo:rerun-if-changed=assets/PhoneNumberMetadata.xml
and splitting out the meta-data loader to be only part of the build stage?
That fixes reruns, but those shouldn't really happen once the crate has been build as a dependency for your own project anyway. It would probably be a good thing to implement either way, because it speeds up the development of rust-phonenumber
by reducing the number of build.rs
runs.
Is this not the perfect place to trigger the generation of the database?
It would be the correct place to trigger it. But still:
My suggested approach to clean things up:
bincode
file is the most optimal way to load the database (it probably is not), remove the bincode
dependency altogether, especially at runtime.Database
structure, make sure it can be loadable in const
time and at compile time, without the need for quick_xml
.Database
loading (and thus the runtime quick_xml
dependency) is used by the community (it probably is, let's not make any assumptions), and consider putting this behind a feature flagcargo:rerun-if-changed
to the build.rs
(#64)My issue with the
build.rs
is more that it's work that's needlessly happening on my machine, and to do that work, it pulls in dependencies that are not critical to the domain problem, parsing a phone-number.I still find your argumentation ill-posed. Compilation of the library itself is also work that's needlessly happening on your machine. Very many libraries have
build.rs
scripts.I still feel like we have an x-y problem here. Do you take issue with the build time that the library poses? Because that, in my eyes, would be a rather good argument to look closely at how we can further optimize the build process. If you indeed take issue with the build time, then I would also like to see numbers and a benchmark, such that we can track the build script performance over time.
My problem is the amount of dependencies being brought into a project I maintain. Not build times per se, but of course more dependencies mean longer build times.
We review our dependencies periodically, and the rust-phonenumber
came up because it actually caused an issue in a script I use to upgrade deps. This is the only dependency we have the uses metadata in it's semver. But that's another story :laughing:
And my script was at fault there.
As an aside: after #63, I imagine we get rid of the
bincode
outputted file altogether. Maybe the database can even be put behind a feature flag.
Yea I find the Database
as part of the public API to be a strange use case. Neither the C++ or Java impl have this feature from what I can see.
I think it's a good idea to at least feature gate it, and consider deprecating it and seeing if you get feedback on usage in the wild.
We certainly don't require
quick_xml
to parse a phonenumber once the database exists. 😸I did not know we still depended on
quick_xml
at runtime, that's indeed something to look at. That too could become a feature flag.
:+1: much appreciated!
Is this not the perfect place to trigger the generation of the database?
It would be the correct place to trigger it. But still:
* Checking in binary files needlessly into version control feels wrong. * We create a security issue because checking in binary files into version control obfuscates the source tree for easy verification. This could be partially remedied by having CI check the consistency between the XML and the bincode file.
Yes I agree that checking in a binary is opaque. The C++ impl uses protobufs from what I see, which has the sames downsides.
I think the most transparent solution is generating Rust code. You could even move the runtime processing of the Vec<Metadata>
into a Database
into build time. This would presumably solve #26.
As the default database is simply embedded into the binary in it's final form.
My suggested approach to clean things up:
* [ ] Evaluate whether loading a `bincode` file is the most optimal way to load the database (it probably is not), remove the `bincode` dependency altogether, especially at runtime.
:+1:
* [ ] We find a nice way to fix [Make database static #63](https://github.com/whisperfish/rust-phonenumber/issues/63): refactor the `Database` structure, make sure it can be loadable in `const` time and at compile time, without the need for `quick_xml`.
:+1: If you remove the Database
from the API you can simplify the Database even further, by not requiring lazy_static
or std::sync::Once
. And I think the Database
may not even need Arcs
or Mutexes
if restructured a little.
But that's a different issue as such.
* [ ] Evaluate whether `Database` loading (and thus the runtime `quick_xml` dependency) is used by the community (it probably is, let's not make any assumptions), and consider putting this behind a feature flag
:+1: Like I said above, I would go so far as to deprecate it and gather feedback on resistance to that. If the database loads quickly then I don't see a use case for providing a custom database.
* [ ] Add `cargo:rerun-if-changed` to the `build.rs`
:+1:
Cool, I think we reached an agreement about basically all points then. I'm in favour of getting the dependency tree smaller, but it should really not go at the cost of a maintenance burden.
I suggest I close this specific issue, and I create issues for the tasks we discussed. Would you agree with that?
Cool, I think we reached an agreement about basically all points then. I'm in favour of getting the dependency tree smaller, but it should really not go at the cost of a maintenance burden.
I suggest I close this specific issue, and I create issues for the tasks we discussed. Would you agree with that?
Yep let's close this issue, the issues you outlined are more focused.
Thanks for your openness :wink:
Closing in favour of #64 and #63
The
database.bin
is currently generated in thebuild.rs
. This is unnecessary and has a few downsides.database.bin
only needs to be generated when theassets/PhoneNumberMetadata.xml
is updated.build.rs
files can be seen as suspicious. A build.rs file executing on my local machine may be outside of the risks I'm willing to takebuild.rs
are required to build the project.I propose that the generation of
database.bin
is moved into an xtask.This means restructuring the project into a workspace where one of the members is the
xtask
package. But the benefits are:xtask
is independent from the published packageassets/PhoneNumberMetadata.xml
is updated.build.rs
anymore