thought-machine / please

High-performance extensible build system for reproducible multi-language builds.
https://please.build
Apache License 2.0
2.47k stars 206 forks source link

REAPI: Filegroups redownload outputs of other targets #2886

Open Tatskaari opened 1 year ago

Tatskaari commented 1 year ago

When building locally, the following filegroup would be a no-op when we build it:

filegroup(
    name = "foo",
    srcs = [":bar"],
)

Please will build :bar, which will put all the outputs in the right place in plz-out, so there's nothing we need to do. This is not the case for remote execution, where we download the outputs of all our src targets regardless. This is not only suboptimal, but can cause issues:

If we run the above we will: 1) Download :bar and set the xattrs on the outputs to set the target hash to :bar 2) Download :foo and clobber that xattr with the target hash of :foo

So when we re-run please, we will re-download :bar as the target hash is set to :foo.

This also causes issues when we have multiple please instances, as we have two targets that can download the same output. If we hold the hash for :foo, then another Please might download :bar clobbering our outputs which can break our build. To avoid this, we've added a grungy bodge to lock all the package local sources of a filegroup in #2885

peterebden commented 1 year ago

Filegroups don't work very well in rex full stop. Ideally they would be done completely locally with zero RPCs; actually we create a synthetic AR and send it to the remote (which is undesirable for performance, but also some configurations might prohibit the client from doing it).

Ideally we would revisit & fix this. I tried once but ended up getting in a mess with missing outputs and things.