salsa-rs / salsa

A generic framework for on-demand, incrementalized computation. Inspired by adapton, glimmer, and rustc's query system.
https://salsa-rs.netlify.app/
Apache License 2.0
2.13k stars 152 forks source link

Fix verification of tracked struct from high-durability query #550

Closed carljm closed 2 months ago

carljm commented 3 months ago

This adds a test for, and proposes one possible fix for, a bug we ran into with durabilities where we were hitting the "access struct from previous revision" assert.

The test models a situation where we have two File inputs (0, 1), where File(0) has LOW durability and File(1) has HIGH durability. We can query an index for each file, and a definitions from that index (just a sub-part of the index), and we can infer each file. The index and definitions queries depend only on the File they operate on, but the infer query has some other dependencies: infer(0) depends on infer(1), and infer(1) also depends directly on File(0). The dependency graph is shown below: light color represents low durability, black represents high durability, dark gray are structs and struct fields; I didn't get around to adding their durability to the graph yet.

(The graph is automatically generated using some hacky printlns I added to Salsa that create a mermaid.live graph; if I end up debugging another issue where it's useful, I'll probably clean this code up and try to make it generally reusable.)

Screenshot 2024-08-01 at 10 19 02 PM

The panic occurs because definitions(1) is high durability, and depends on index(1) which is also high durability. index(1) creates the tracked struct Definition(1), and infer(1) (which is low durability) depends on Definition.file(1).

After a change to file 0 (low durability), we only shallowly verify definitions(1) -- it doesn't need deep-verify because it is high durability. So that means we never verify index(1) at all (deeply or shallowly), which means we never mark Definition(1) validated. So when we deep-verify infer(1), we try to access its dependency Definition.file(1), and hit the panic because we are accessing a tracked struct that has never been re-validated or re-recreated in R2.

The proposed fix is that in maybe_changed_after for a tracked struct field, rather than checking that the struct has been created/validated at the current revision, check that it has been created/validated at least as recently as its durability last changed, and if that is older than the current revision, validate it. The logic here is that this tracked struct can't have changed unless some input at or higher than its durability has changed.

The other possible fix I considered (but haven't tried implementing yet) is that when we short-circuit a shallow verify due to durability, we need to not only mark our own outputs validated, but recursively find all our inputs that also verify due to durability, and validate all of their outputs as well. That fix seems more complex and expensive than what I have in this PR.

netlify[bot] commented 3 months ago

Deploy Preview for salsa-rs canceled.

Name Link
Latest commit 5c541dc94240c8f020ea8da3edb0bee0cdaa3fca
Latest deploy log https://app.netlify.com/sites/salsa-rs/deploys/66bcc55a4b54c600086ce361
codspeed-hq[bot] commented 3 months ago

CodSpeed Performance Report

Merging #550 will not alter performance

Comparing carljm:struct-panic-main (5c541dc) with master (08820ea)

Summary

✅ 8 untouched benchmarks

nikomatsakis commented 3 months ago

Will attempt to read this tomorrow-ish.

MichaReiser commented 3 months ago

I don't have a good repro yet but I just ran into the following error.

2024-08-06 13:52:36.553 [info]    0: red_knot_server::server::Server::run::{{closure}}
             at /home/micha/astral/ruff/crates/red_knot_server/src/server.rs:140:29
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/alloc/src/boxed.rs:2077:9
   2: std::panicking::rust_panic_with_hook
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:799:13
   3: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:656:13
   4: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/sys_common/backtrace.rs:171:18
   5: rust_begin_unwind
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:652:5
   6: core::panicking::panic_fmt
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panicking.rs:72:14
   7: salsa::tracked_struct::struct_map::StructMap<C>::get_from_map
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/tracked_struct/struct_map.rs:200:9
   8: salsa::tracked_struct::struct_map::StructMap<C>::get
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/tracked_struct/struct_map.rs:177:9
   9: salsa::tracked_struct::IngredientImpl<C>::lookup_struct
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/tracked_struct.rs:382:9
  10: red_knot_python_semantic::semantic_index::definition::_::<impl salsa::id::LookupId for red_knot_python_semantic::semantic_index::definition::Definition>::lookup_id
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/components/salsa-macro-rules/src/setup_tracked_struct.rs:144
2024-08-06 13:52:36.553 [info] :21
  11: <red_knot_python_semantic::types::infer::infer_definition_types::Configuration_ as salsa::function::Configuration>::id_to_input
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/components/salsa-macro-rules/src/setup_tracked_fn.rs:196:29
  12: salsa::function::execute::<impl salsa::function::IngredientImpl<C>>::execute::{{closure}}
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/function/execute.rs:46:58
  13: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panic/unwind_safe.rs:272:9
  14: std::panicking::try::do_call
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:559:40
  15: __rust_try
  16: std::panicking::try
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:523:19
  17: std::panic::catch_unwind
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panic.rs:149:14
  18: salsa::cycle::Cycle::catch
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/cycle.rs:42:15
  19: salsa::function::execute::<impl salsa::function::IngredientImpl<C>>::execute
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/function/execute.rs:46:27
  20: salsa::function::fetch::<impl salsa::function::IngredientImpl<C>>::fetch_cold
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/function/fetch.rs:104:14
  21: salsa::function::fetch::<impl salsa::function::IngredientImpl<C>>::compute_value::{{closure}}
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/function/fetch.rs:46:29
  22: core::option::Option<T>::or_else

Which may indicate that we need to validate all tracked structs?

carljm commented 3 months ago

Yeah, that traceback does suggest that there's still a path to accessing an un-validated struct :/

When you say you don't have a good repro, do you mean that you can't reproduce it consistently at all? That's going to make things difficult...

MichaReiser commented 3 months ago

@carljm I haven't spent time on finding a consistent repro. I ran into it when I switched between different files in the LSP

nikomatsakis commented 3 months ago

My thought is that we should just update to the most recent revision during the get operation. I think the only way to have an issue is to cheat and leak the data through thread local storage. We do need to update to prevent later writes from occurring in that scenario, however

On Wed, Aug 7, 2024, at 9:02 AM, Niko Matsakis wrote:

@.**** commented on this pull request.

OK, I get the overall bug. Let me think on this.

— Reply to this email directly, view it on GitHub https://github.com/salsa-rs/salsa/pull/550#pullrequestreview-2222698699, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABF4ZUZJEYIJAPF3RSKNITZQGZ6BAVCNFSM6AAAAABL35N3WOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEMRSGY4TQNRZHE. You are receiving this because you are subscribed to this thread.Message ID: @.***>

carljm commented 3 months ago

Always updating on the get path makes sense to me and should simplify this PR. I'll update to that approach.

carljm commented 2 months ago

Updated to always validate on get. Some things I'm not sure about here:

1) This means there can be cases (as in the added test) where the same struct is validated on get, and then later re-validated via mark_validated_output in the same revision. This means validate can no longer assert that the struct isn't already validated. I'm not sure what the comment "Never update a struct twice in the same revision" was attempting to prevent, but it doesn't seem like setting data.created_at = current_revision when they are already equal is going to cause a problem?

2) I wondered whether validating on get might mean we no longer need mark_validated_output on tracked structs at all, but making it a no-op caused another test (deletion-cascade) to fail with the "access obsolete struct" panic.

3) This change means both get and update now need to take &Zalsa rather than just a current_revision, so that in get_from_map we can call zalsa.last_changed_revision with the durability, which we only have access to after we get the data out of the map.

carljm commented 2 months ago

Test failure looks like a compiler improvement in nightly catching an unmatchable pattern, not related to this PR.

davidbarsky commented 2 months ago

Test failure looks like a compiler improvement in nightly catching an unmatchable pattern, not related to this PR.

I've ignored it; it should be fixed soonish: https://github.com/rust-lang/rust/issues/129031. If you rebase, CI will pass.

nikomatsakis commented 2 months ago

I THINK this is fixed by #564

MichaReiser commented 2 months ago

I can confirm this is fixed (https://github.com/astral-sh/ruff/pull/13016/files).

Did we add a test for it?

nikomatsakis commented 2 months ago

@MichaReiser I don't think I added a specific test, I'm going to close this PR, but a PR with a test would be great