rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.33k stars 12.58k forks source link

Make untracked incr. comp. information inaccessible #90317

Open cjgillot opened 2 years ago

cjgillot commented 2 years ago

When iterating over a collection, be it a Vec, a HashMap or an IndexMap, the order of items influences the value of the resulting hash: [a, b] and [b, a] have different hashes.

Meanwhile, there is some information we do not want to track. This is the case of the value of LocalDefId. Inserting a function in a file will change other functions LocalDefId, but it will not change their DefPathHash.

My concern is about controlling this information flow. In order to do that ToStableHashKey trait replaces the iteration order of the collection (which for hash maps is based on the key and the allocated memory capacity and should be irrelevant to the compilation), by the order of a stable hash key (the DefPathHash when the key is LocalDefId). By sorting the vectors by stable key, we manage the information flow.

Using IndexMap, the iteration order is the insertion order. Normally, this insertion order should only depend on tracked information obtained by depending on another query. For instance, a HIR visitor will create a query dependency on hir_owner_nodes, which hashes the in-code declaration order of functions. However, and this is my concern, the order of LocalDefId is freely available without using a query and is purposely untracked.

In order to make IndexMaps safe for incr. comp., we need to ensure untracked information is inaccessible.

Known issues:

Originally posted by @cjgillot in https://github.com/rust-lang/rust/pull/90253#issuecomment-952203213

pierwill commented 2 years ago
  • [ ] Stop implementing PartialOrd and Ord on DefId...

Can these items be done one at a time?

cjgillot commented 2 years ago

Of course!

pierwill commented 2 years ago

Will give this a go and see what happens. @rustbot claim

pierwill commented 2 years ago

I started experimenting with LocalDefId. I wonder if the DefId work might end up being slightly less complex. 🤔

pierwill commented 2 years ago

I wonder if the DefId work might end up being slightly less complex. 🤔

Update: No. 😄

pierwill commented 2 years ago

Looks like removing Ord, PartialOrd for ExpnId and LocalExpnId might require changes to the macro rustc_index::newtype_index because of this line:

https://github.com/rust-lang/rust/blob/473eaa42e9365c47d129f72693b5d163a20cf369/compiler/rustc_index/src/vec.rs#L107

bjorn3 commented 2 years ago

Track accesses to command-line options; Track accesses to debugging command-line options.

Most of them are already "tracked" by discarding the entire incr comp cache if they change. The ones that aren't tracked are explicitly not tracked. See the UNTRACKED mentions in https://github.com/rust-lang/rust/blob/d22dd65835190278f315e06442614142653ec98f/compiler/rustc_session/src/options.rs A couple of them should be marked as TRACKED though I think.

pierwill commented 2 years ago

Here's where we'd be if we merge https://github.com/rust-lang/rust/pull/90408:

  • [ ] Stop implementing PartialOrd, Ord for HirId
  • [ ] Stop implementing PartialOrd, Ord for DefId
  • [x] Stop implementing PartialOrd, Ord for LocalDefId
  • [ ] Stop implementing Idx for LocalDefId
  • [ ] Stop implementing PartialOrd, Ord for ExpnId
  • [ ] Stop implementing PartialOrd, Ord for LocalExpnId
  • [ ] Stop implementing Idx for LocalExpnId
pierwill commented 2 years ago

@cjgillot It looks like rustc_span::hygiene::ExpnId does not implement Ord or PartialOrd. rustc_span::hygiene::LocalExpnId does implement ordering.

(So does rustc_middle::thir::ExprId. Is that related to this issue?)

Aaron1011 commented 2 years ago

I think it would be useful to enforce that UNTRACKED options are not accessed within a query (we could add a helper function to suppress the check, to allow for things like debug printing inside a query).

pierwill commented 2 years ago

Correction to list of known issues: LocalExpnId does implement Ord.

Known issues:

  • [ ] Stop implementing PartialOrd and Ord on DefId;
  • [ ] Stop implementing PartialOrd, Ord and Idx for LocalDefId;
    • [ ] Stop implementing PartialOrd, Ord and Idx for LocalExpnId;
  • [ ] Enforce that UNTRACKED options are not accessed within a query.
pierwill commented 2 years ago

I think it would be useful to enforce that UNTRACKED options are not accessed within a query

@Aaron1011 Did you have someplace specific in mind for where to add this check?

bjorn3 commented 2 years ago

The fields could be made private and instead a getter for each option could be exposed that run dep_graph.assert_ignored(). And maybe also add an _untracked variant of each getter for cases where accessing it inside a query is fine, like for debug dumps.

pierwill commented 2 years ago

The fields could be made private...

That is, fields of rustc_session::options::Options. For example: https://github.com/rust-lang/rust/blob/3b263ceb5cb89b6d53b5a03b47ec447c3a7f7765/compiler/rustc_session/src/options.rs#L165 https://github.com/rust-lang/rust/blob/3b263ceb5cb89b6d53b5a03b47ec447c3a7f7765/compiler/rustc_session/src/options.rs#L1176

cjgillot commented 2 years ago

@cjgillot It looks like rustc_span::hygiene::ExpnId does not implement Ord or PartialOrd.

All the better.

rustc_span::hygiene::LocalExpnId does implement ordering.

So we should remove it. I don't think it's required anywhere.

(So does rustc_middle::thir::ExprId. Is that related to this issue?)

There is no issue here. ExprId is local to a function body, reordering statements within a function body changes meaning, unlike reordering functions.

pierwill commented 2 years ago

Current status:

Items still under consideration:

tmiasko commented 2 years ago

The Ord for Span also seems problematic. Stable hash for span is based on file, line and column, so it is stable within a single file. At the same time Ord provides an order between different files, which is turn is based on the order files happen to be loaded into the source map.

For example going from pub mod a; pub mod b, to pub mod b; pub mod a, will result in the same hashes for spans from module a and module b but they will have a different order.

Ord for Span also compares SyntaxContext which doesn't seem to be stable either:

https://github.com/rust-lang/rust/blob/74fbbefea8d13683cca5eee62e4740706cb3144a/compiler/rustc_query_impl/src/on_disk_cache.rs#L69-L74

And the PartialOrd / Ord is derived and compares only the id:

https://github.com/rust-lang/rust/blob/74fbbefea8d13683cca5eee62e4740706cb3144a/compiler/rustc_span/src/hygiene.rs#L47-L48

pierwill commented 2 years ago

@rustbot label -E-help-wanted

pierwill commented 2 years ago

The Ord for Span also seems problematic. Stable hash for span is based on file, line and column, so it is stable within a single file. At the same time Ord provides an order between different files, which is turn is based on the order files happen to be loaded into the source map.

For example going from pub mod a; pub mod b, to pub mod b; pub mod a, will result in the same hashes for spans from module a and module b but they will have a different order.

Ord for Span also compares SyntaxContext which doesn't seem to be stable either.

Thanks @tmiasko! This is super interesting. My sense is that the ordering on Spans is used a lot, often as a kind of fallback. I wonder, though, if removing this ordering is even possible! Don't we need to order certain things like lints based on the order of their appearance in source code? Maybe we could use an implementation that orders Spans within a given file, but not between files?

@cjgillot: what do you think of Ord for Span?