Closed michaelwoerister closed 7 years ago
If a query A starts a sub-query
B
andB starts a sub-sub-query
C, which would invoke A again, then we would have to make sure that neither the results ofB
orC
are cached. But since we are still in the middle of computingB
andC
, we don't have any result to cache anyway.
Yes, but on the way back up to A
, both B
and C
get cached today and they saw the cycle.
@eddyb Oh, an Err(CycleError)
seems to get cached, yeah. We could just... not do that, right?
@michaelwoerister The actual error is uninteresting, but B
and C
may change based on it.
Looking at the code again, no Err(CycleError)
actually seems to be cached. The semantics of this are so subtle that I keep wanting to get rid of it. Imagine having to explain this to a potential contributor :(
@michaelwoerister AFAIK we'd regress in a few places if we removed cycle recovery. I'd rather have it be enforced correct-by-default and have a warning that if you use try_get
your queries may be triggered more than once, so it's a performance/side-effect hazard, but nothing worse.
Sorry I've been slow to weigh in on this thread. I've been thinking about cycles a lot. At this point, my opinion is that -- at least for now -- we should remove cycle recovery from the query system (that is, cycles should always be considered fatal to compilation -- they don't necessarily have to panic). Things that legitimately want to recover from cycles (which I think is basically just the trait system) should handle it themselves for now.
We don't actually use cycle recovery that much right now, and I think all of the cases can make do with some form of hard error (after a bit of refactoring). We can survey them quite quickly.
item_path
uses try_get
as a kind of "best effort" attempt to peek at the type information and include it in the item path only when it's available. This was a stupid idea on my part. It induces false cycles. We should just deterministically turn off the use of type information for item paths except for in trans, I think, and maybe also there. Or something. try_get
to handle cycles in the call graph. This was a clever idea but I think ultimately not the right way -- it makes things nondeterministic. Imagine that you have foo()
which calls bar()
which calls foo()
. If we start by asking for the optimized MIR for foo
, it can inline bar
, but bar
will not inline foo
. However, if we start by asking for the optimized MIR for bar
, you get the opposite result. I think you should get the same result whichever optimized MIR you ask for first.
sized_constraint
and needs_drop_raw
is using try_get
, but in this case a cycle always indicates that an error should have occurred elsewhere. In particular, it can occur with non-representable types like struct Foo { f: Foo }
. We can probably handle this not by recovering from the cycle but by using delay_span_bug
to report it, or something like that.Unless I missed something, those are all the uses of ::try_get
in the code-base, and hence (I think) all the uses of cycle recovery that I am aware of.
I was thinking of making trait selection into a query, but I've since thought better of it. I think it will be something "query-like" -- in particular, it will be a function that starts with just a tcx
and a key -- but it will want to manage its own caches and it will also need to manage its own dependencies (using anonymous nodes). There is a lot of stuff very particular to the trait system that needs to happen (e.g., for a cycle, we need to check all the traits involved in the cycle and check whether they are auto traits, etc), and I don't see that generalizing very well. Also, I think that ultimately trait matching will not want to be a "DFS" sort of procedure, but rather something more like the ObligationForest
, where we build up a dependency graph and kind of "iterate over it" until we are done.
Ultimately my view is that we should aim to keep the query system fairly simple, at least for now. Maybe eventually we will want to expand it into a full-on attribute grammar system that can handle all sorts of data propagation or whatever, but I think that the current setup -- where a query is basically just a memoized function call -- seems about right.
As a side-note regarding inlining: With https://github.com/rust-lang/rust/pull/43183, the trans-collector builds up a fairly accurate, global "call graph" between translation items. This could be useful for inlining too. It is post-monomorphization though, which is good for the quality of information it can provide, but which also means that it is a bit late in the pipeline.
I originally planned to describe the differences between the current query system and the future version that supports red/green dependency tracking but while thinking about it, there's not too much of a difference when it comes to cycle errors:
force
the evaluation of a query but that's no different from a regular query.A
starts a sub-queryB
andB
starts a sub-sub-queryC
, which would invokeA
again, then we would have to make sure that neither the results ofB
orC
are cached. But since we are still in the middle of computingB
andC
, we don't have any result to cache anyway.The main difference I see between the current query system and the red/green one is dependency tracking (and there the current system behaves a bit strangely, actually): When a query is started, a new
DepNode
is allocated and reads to thatDepNode
are registered as they occur. E.g., we let's say we have the following "query trace":The dependency graph would look exactly like that, cycle and all.
In the red/green system, a
DepNode
would only be allocated after the successful evaluation of a query. Consequently, we would not end up with the dep-graph shown above. But we still want to record dependencies to those sub-queries because their result influenced what1
evaluated to after recovering from the cycle. I propose to just merge all reads into the parent query when encountering a cycle error, effectively merging all reads into the root of the cycle. The dep-graph from above would then look like this:There are no
DepNodes
for5
and7
(they never completed) but we don't lose the information that6
has been read, which is important, because the result of6
might also have caused a trace without cycle.I still think that recovering from cycle errors is dubious but at least the situation should not get worse with the new system.
Comments? @eddyb @nikomatsakis @rust-lang/compiler