Closed stuaxo closed 4 years ago
Yes, the # coding:
comment makes this possible. This feature was meant for defining custom encodings, e.g. translating Python keywords into a new language.
The interface for registering custom codecs exposes the tokenizer output (here), which can be consumed in it's entirety to get the full source file.
Once I have the full source code I use a failure-tolerant parser parso here, which is meant for incremental parsing for e.g. editor integrations, which puts "error" AST nodes where the C functions are.
This is somewhat brittle, but has worked well enough. Ideally, this would be replaced by a custom parser, which finds the @inlinec annotations and maybe even switches to a C parser for the function body. However, this requires context-sensitive parsing which is a little tricky. PRs welcome :-)
Then I traverse the AST, find the nodes decorated with @inlinec
, define ctypes wrappers for the function bodies, replace the body of the function with a call to the wrapper and glue the wrapper imports to the top of the file.
Once that's done I re-tokenize and return the new token stream and the Python interpreter only ever sees the transformed version of the source code.
Lots of things you can imagine with this, I'll have to put this on my procrastination-list and have a play.
One thing that could be handy is passing the language into the decorator
@inline('c')
@inline('nim')
etc... and have the languages plugable somehow.
I wonder if coupling this with ccache
could speed things up ?
You could name the generated C source files using some quick hash of the content (xx_hash if you care about speed) - in that way ccache
would probably not try and recompile things it already has.
Absolutely! I like your line of thinking here :)
I'm really interested in how this works, wrt custom codecs. Is this activated by the
coding
part of the header ?Is it it possible to link to some docs from the README here?