Index post-processed sources

robinp commented 7 years ago

Brought up by @mpickering. A few things to sort out for that question:

Does GHC provide convenient access to the post-processed ASTs, ideally with post-processed spans?
If not, are the post-processed sources accessible and can we do an extra compilation for them to get the spans?
How would we deduplicate definitions/references that are present in both the original and post-processed sources? (Sidenote: IIRC GhcAnalyser drops references that originated from generated code, but not sure if TH falls under that condition).
Where would we place post-processed code (this is a valid question for CPP too)? Kythe supports virtual roots, and generally we can emit whatever code fragments we want anywhere in the tree, but it would have to be thought up what reference/generates/... edges would be present.
Would we emit full postprocessed sources (more problematic duplication-wise), or do some smart thing to just put the TH-generated source fragments in virtual files?

+@creachadair: does for example the Kythe C++ indexer emit virtual fragments for un-CPP-d code? Do you have any takeaways from earlier attempts on this topic?

creachadair commented 7 years ago

does for example the Kythe C++ indexer emit virtual fragments for un-CPP-d code? Do you have any takeaways from earlier attempts on this topic?

I'm not entirely sure I understand what you mean by "virtual fragments". But to your other questions: We don't currently store the fully-preprocessed versions of files—the indexer does hook some of the Clang preprocessor actions to capture (e.g.) macro definitions, #include lines, and so on, and to keep track of the state we need to disambiguate variant expansions of the same file under variations of #define settings, inclusion order, and so forth. But the only source text we capture are the original files.

From a captured compilation unit, you could of course set up clang and actually capture the CPP output, but we haven't done that so far. The only obvious justification would be to try to reify macro expansions, but that's problematic because macro expansion is not layered, so you can't practically keep track of nested expansions (which are common). Perhaps more pernicious than that, a C macro expansion isn't even required to emit a syntactically-complete form, so the relationship between the notional "visible" syntax of the file (where a macro expansion looks like a variable or a function call) and the underlying C AST is very fiddly.

We've talked about trying to do something more concrete with macros, but so far there hasn't been a productive UI query to work from.

robinp commented 7 years ago

Mind dump: the main usecases I can see for postprocessed source indexing:

1) Compiler generated splices. The compiler can auto-derive instance implementations, whose sources are not visible by default. But these instances (Show, Data, ...) are not very interesting.

2) TH generated splices. The TemplateHaskell expansions could be put somewhere and crossreferenced.

3) CPP macro expansions. Like above, but for expanded C-preprocessed macros.

For both 2) and 3) what happens now is that the bindings are put inside the span of the TH/macro invocation in some unpredictable (and unclickable) manner.

If we could stash the expansions somewhere (even as individual fragments), maybe we could connect the invocation anchor with the fragment anchors in a way that makes sense for the UI navigation (no idea exactly how).

mpickering commented 7 years ago

For 2/3 some of the infrastructure I use for core-kythe would be useful as you would need spans or pretty prjnted output.

google / haskell-indexer

Index post-processed sources #58