Closed jdeans289 closed 1 year ago
Actually it turns out that LibClang gives us a lot more than I thought it would for this! It turns out that I can process each individual template specialization separately, and DataTypes doesn't even need to have any awareness of templates :grin: . This is what the original ICG does.
This makes sense. Template processing happens before compile time. It's also way more complicated than what I want to handle. Of course LibClang does this for us, that's the entire point of using a library like this.
The main changes to DataTypes is just to include template structures in parsing. Which, actually, we can probably just add "<>" to the characters that we look for.
We'll also need to think a bit about making sure users understand how template instantiation works. Doing a dynamic allocation through the memory manager of a template that wasn't instantiated at compile time wouldn't work. I think the current Trick works this way too, it's not a limitation of Trick, it's a fundamental limitation of C++.
So it turns out LibClang does NOT in fact just let me traverse template instantiations as separate branches of the AST. I expected this because if you dump the AST using just clang (clang -Xclang -ast-dump <filename>
) we can see the template instantiations as children of the original class template declaration. It doesn't appear that LibClang supports this at all. In fact, I found an unmerged pull request for LLVM from 2018 that adds exactly what I need to LibClang :cry: . This seems to be a known limitation - I knew going in that LibClang doesn't expose the entire AST, but I was hoping that I wouldn't run into those cases.
I have a few approaches to try to get around this:
std::vector<T>
? or a mixin inheritance type pattern template <typename T> class A : public T {}
?). Templates in C++ are wild. This is what I was planning to do initially, but I got really excited when I thought I might not have to.clang -Xclang -ast-dump file.hpp > ast.txt
, and write my very own AST parser. No LibClang, LibTooling, or other bad library options. The only dependency would be on the stability of the generation of this AST, which I would have to research. This is starting to seem like an ok option, ~although I think the biggest loss would be the ease of getting fully qualified types~ nevermind fully qualified types are there. And having to filter out the system headers (llvm why isn't there a command line flag for this????)What I wrote about DataTypes still applies - DataTypes should not be aware of templates.
I am actually starting to seriously consider the option of just parsing the AST tree myself, since it can output to json. We could get rid of the dependency on LibClang, and just require some clang install, not any of the specialized clang devtools. Plus, someone else who wrote a library based on LibClang and is much smarter than me recommends this approach because of all the shortcomings of libclang. I think I'll take a run at this tomorrow.
Before I forget - we can include comments and everything in the generated ast. Use this command -
clang -Xclang -ast-dump=json -fparse-all-comments foo.hpp > ast.json
Putting this on hold for now.
The AST parsing approach hit a really annoying blocker - all of the system headers are pulled into the AST at this stage of the compilation. In the libClang and libTooling interfaces, they track the locations of the files and you can filter out branches of the AST in system headers easily using their provided utilities, but I can't figure out how to do that myself. It seems like there must be a way to do this. Clang does not provide a utility for filtering the dumped AST, but the AST does include the file that everything is from. Even with this, I'm unsure how we would check whether or not a file is a system header.
There are other options here - you can dump a partial AST, so maybe I could just dump it for a class template and then use that for class templates. This seems like the most promising approach.
I don't think doing the substitution myself is a reasonable solution. It would be really brittle and would probably end up actually regressing functionality from the current Trick.
Another option is to contribute the missing functionality to LibClang myself. I'm pretty excited about this option, but I don't think it's practical if we're actually trying to get this functionality into Trick. We would have to wait for my changes to be accepted, and then for that to be rolled into a release, and then for that release to roll out to the labs. LLVM version is a major dragging factor for some Trick users, so relying on the most recent version would be a big change. Also, part of the point of this whole project was to try to base ICG off of a stable dependency to avoid this exact problem.
Ran into more problems with LibClang. For some reason, whenever I include both <vector>
and <string>
it stops being able to process vectors????? Why??????
LibClang is also pretty finicky on Mac, and it's tough to give it the correct path to the SDK where system headers live. I can't tell if this is a LibClang problem or a me not knowing the right way to pass arguments problem.
Overall, I think that switching to parsing the AST myself is going to be the move here. Maybe I can still use LibClang just to find all of the top level class definitions, and then dump the AST filtered on each class and parse the internals myself. I think that should solve the problem of not being able to filter out the system headers, and overcome the limitations of LibClang.
Decided to parse the AST on my own. Templates work now.
Need support for templated types.
I think we'll need to add another structure to the DataTypeInator to track class templates (Will they always be composite types? Maybe, depending on how the STL types are implemented). Make some new TemplatedClass representation to track these, withe spaces to fill in with however many template parameters. Whenever a class template is instantiated, it should be looked up in the template dictionary, and that should be used to add it to the TypeDictionary.
This will probably be hard. But it has to make sense. No more ad hoc stuff.