zigtools / zls

A Zig language server supporting Zig developers with features like autocomplete and goto definition
MIT License
2.95k stars 295 forks source link

Use `AstGen` in analysis #551

Open SuperAuguste opened 2 years ago

SuperAuguste commented 2 years ago

AstGen generates the fancy ZIR untyped stage2 intermediate representation format that is a nice bit of padding between AST and analysis. We can't use AIR (the typed stage2 intermediate representation format that comes after ZIR) if the AST is invalid but we can use ZIR, so it's a really nice middle ground. AstGen is fast, accurate, and officially supported and it would simplify our analysis logic a bunch!

SuperAuguste commented 2 years ago

Note: AstGen is slightly less flexible than using AST, for better or for worse. Essentially, AST can help us analyze all sorts of nightmare scenarios, albeit rather inaccurately, whereas AstGen can complete only valid AST (per-block) but a lot more accurately.

For example, AST analysis can complete the following but AstGen will need to wait for you to resolve the uninitialized variable error first:

var amogus: BasedType;
amogus.

On the other hand, using AstGen means reducing zls' code footprint, properly following the (eventual) Zig spec, and matching the Zig compiler's feedback one-to-one. We get actual ast-check diagnostics, we can analyze all sorts of scenarios without a million massive switch statements, and more!

  1. Thoughts on this?
  2. Anybody want to explore attempting to "flexibilize" AstGen to be able to parse invalid Ast?
matu3ba commented 2 years ago

See https://rdambrosio016.github.io/rust/2020/09/18/pure-ast-based-linting-sucks.html#so-how-does-a-linter-work for another approach.

I do see 2 potential main problems in the design space of Zig parser:

Recovery code and tests may be sufficient, but handling multiple faults requires more complex state handling.

Next steps

SuperAuguste commented 2 years ago

Really interesting article @matu3ba! Not sure if this is the right place to discuss this though! Perhaps opening a new issue would make sense?

matu3ba commented 2 years ago

Perhaps opening a new issue would make sense?

I'll make one once I have perf numbers up to AST (from my zig-reduce stuff), such that we are not only theory crafting. I hope to have them in 7 days (busy on job etc).

matklad commented 1 year ago

whereas AstGen can complete only valid AST (per-block) but a lot more accurately.

My gut feeling here is that the first-best solution is to teach AstGen to gracefully handle invalid AST. Handling partial input in the parser/AST is the actually hard bit, where you need some explicit code. Above that, it's straightforward to fix up all missing names/expressions/types etc with a special "unknown" value, and the code there "writes itself", as you just propagate the unknowns (eg, amogus. AST is translated to amogus.<unknown> zir which resolves to <unknown> field which has <unknown> type).