pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.33k stars 636 forks source link

BSP: Skip re-writing outputs if they already exist #15379

Open stuhood opened 2 years ago

stuhood commented 2 years ago

Currently, the BSP target rules are marked @_uncacheable_rule, and write output Digests for each run.

The uncacheability means that each refresh of the project (without closing the BSP connection) will recreate the Digests. But even without being uncacheable, restarting the connection (likely because the IDE has restarted, and so Pants does as well), will currently cause the output Digest to be re-written from scratch. IntellIj apparently uses timestamps rather than fingerprinting to decide what to re-index, because this currently results in the project being re-indexing each time the BSP connection is established.

To fix this, we will likely want to use a trick (which is implemented-but-unused for capture_snapshots) where we optionally materialize a magic file next to the materialized directory which contains its digest. Workspace.write_digest would then opt-in to using this re-write skipping (since it is not bullet proof: a caller needs to know that noone is independently mutating the destination).

AlexTereshenkov commented 1 year ago

@stuhood I have a related proposal - to have workspace.write_digest(digest) extended with the overwrite boolean argument to indicate that if there's a file/directory in place, they should be overwritten. For instance, writing outdir/foo.txt into the outdir/ directory containing bar.txt, should result in having with outdir/ directory with only foo.txt. Currently, it would preserve the bar.txt file.

I know that goal rule can perform side effects, so one could just delete the directory first with pure Python command (e.g. os or pathlib), but it feels it would be tidier if this could be done straight from the Pants engine to avoid mixing the paradigms.

Would this ticket enable this behavior or it's orthogonal to what you propose?

stuhood commented 1 year ago

Would this ticket enable this behavior or it's orthogonal to what you propose?

It would be orthogonal. But yea: feel free to add that argument to write_digest. I'd suggest doing it in the Rust code here:

$ git diff -- src/rust/engine/src/externs/interface.rs
diff --git a/src/rust/engine/src/externs/interface.rs b/src/rust/engine/src/externs/interface.rs
index ea391252b6..fd2b9af66f 100644
--- a/src/rust/engine/src/externs/interface.rs
+++ b/src/rust/engine/src/externs/interface.rs
@@ -1628,6 +1628,8 @@ fn write_digest(
   py_session: &PySession,
   digest: &PyAny,
   path_prefix: String,
+  // pyo3 treats trailing `Option` arguments as optional in python function signatures.
+  clear_destination: Option<bool>,
 ) -> PyO3Result<()> {
   let core = &py_scheduler.0.core;
   core.executor.enter(|| {
@@ -1635,12 +1637,19 @@ fn write_digest(
     py_session.0.workunit_store().init_thread_state(None);

     let lifted_digest = nodes::lift_directory_digest(digest).map_err(PyValueError::new_err)?;
+    let clear_destination = clear_destination.unwrap_or(false);

     // Python will have already validated that path_prefix is a relative path.
     let mut destination = PathBuf::new();
     destination.push(core.build_root.clone());
     destination.push(path_prefix);

+    if clear_destination.unwrap_or(false) {
+      std::fs::remove_dir_all(destination).map_err(|e| {
+        PyException::new_err(format!("Failed to clear {}: {e}", destination.display()))
+      });
+    }
+
     block_in_place_and_wait(py, || async move {
       core
         .store()