Use `AstGen` in analysis

SuperAuguste commented 2 years ago

AstGen generates the fancy ZIR untyped stage2 intermediate representation format that is a nice bit of padding between AST and analysis. We can't use AIR (the typed stage2 intermediate representation format that comes after ZIR) if the AST is invalid but we can use ZIR, so it's a really nice middle ground. AstGen is fast, accurate, and officially supported and it would simplify our analysis logic a bunch!

SuperAuguste commented 2 years ago

Note: AstGen is slightly less flexible than using AST, for better or for worse. Essentially, AST can help us analyze all sorts of nightmare scenarios, albeit rather inaccurately, whereas AstGen can complete only valid AST (per-block) but a lot more accurately.

For example, AST analysis can complete the following but AstGen will need to wait for you to resolve the uninitialized variable error first:

var amogus: BasedType;
amogus.

On the other hand, using AstGen means reducing zls' code footprint, properly following the (eventual) Zig spec, and matching the Zig compiler's feedback one-to-one. We get actual ast-check diagnostics, we can analyze all sorts of scenarios without a million massive switch statements, and more!

Thoughts on this?
Anybody want to explore attempting to "flexibilize" AstGen to be able to parse invalid Ast?

matu3ba commented 2 years ago

See https://rdambrosio016.github.io/rust/2020/09/18/pure-ast-based-linting-sucks.html#so-how-does-a-linter-work for another approach.

I do see 2 potential main problems in the design space of Zig parser:

1. Makes error recovery for a parser very constrained
1. Incremental reparsing is impossible

Recovery code and tests may be sufficient, but handling multiple faults requires more complex state handling.

Next steps

1. To decide if it is worth the complexity, the perf cost (mostly latency from different steps) and expected complexity for worst case supported use case need to be analyzed and compared to expected gain
1. How feasible is it to "parse 1 step and undo 1 step" ie via invalidation of AST fields (of the slice) and copying them over to the reduced form (like the steps to fill a mtree via offset array)?
1. What is the expected frequency and cost of source location access?
1. How fast is lookup via source location into the AST to modify it (user puts cursor somewhere to add/remove things)?
1. How do we prevent feature creep? Limit analysis, when {} and () are not matching?
1. Find good use case for analysis.

SuperAuguste commented 2 years ago

Really interesting article @matu3ba! Not sure if this is the right place to discuss this though! Perhaps opening a new issue would make sense?

matu3ba commented 2 years ago

Perhaps opening a new issue would make sense?

I'll make one once I have perf numbers up to AST (from my zig-reduce stuff), such that we are not only theory crafting. I hope to have them in 7 days (busy on job etc).

matklad commented 1 year ago

whereas AstGen can complete only valid AST (per-block) but a lot more accurately.

My gut feeling here is that the first-best solution is to teach AstGen to gracefully handle invalid AST. Handling partial input in the parser/AST is the actually hard bit, where you need some explicit code. Above that, it's straightforward to fix up all missing names/expressions/types etc with a special "unknown" value, and the code there "writes itself", as you just propagate the unknowns (eg, amogus. AST is translated to amogus.<unknown> zir which resolves to <unknown> field which has <unknown> type).

zigtools / zls

Use `AstGen` in analysis #551