numba / numba-rvsdg

Numba compatible RVSDG (Regionalized Value State Dependence Graph) utilities.
https://numba-rvsdg.readthedocs.io/
BSD 2-Clause "Simplified" License
18 stars 7 forks source link

implement and test SCFG->AST conversion #127

Closed esc closed 5 months ago

esc commented 5 months ago

This implements the creation of Python source code from a potentially restructured SCFG.

The main entry-point is:

from numba_rvsdg import SCFG2AST

And the round-trip test for the entry-points shows how to use the API:

class TestEntryPoints(TestCase):

    def test_rondtrip(self):

        def function() -> int:
            x = 0
            for i in range(2):
                x += i
            return x, i

        scfg = AST2SCFG(function)
        scfg.restructure()
        ast_ = SCFG2AST(function, scfg)

        exec_locals = {}
        exec(ast.unparse(ast_), {}, exec_locals)
        transformed = exec_locals["transformed_function"]
        assert function() == transformed()

Special attention was paid to the testing of the transform. For all the previously defined testing functions to test the AST -> SCFG direction, the tests were augmented to also test the SCFG -> AST direction. To test a transformed function, we assert behavioral equivalence by running the original and the transformed through a set of given arguments and ensure they always produce the same result. Additionally we ensure that all lines of the test function are covered using a custom sys.monitoring setup. (As a result the package now needs at least 3.12 for testing). This ensures that that the set of arguments covers the original function and also that the transformation back to Python doesn't create any dead code.

Special thanks to @stuartarchibald for the stater patch for the custom sys.monitoring based tracer.

Overall the iteration over the SCFG is still somewhat unprincipled, however, the tests and overall coverage do seem to suggest the approach works. Now that we can transform Python programs and have solid tests too, more thought ought to be invested into storing the SCFG in a non-recursive data-structure and developing a more elegant API to traverse the graph and it's regions depending on the use-case.

Typing and annotations are a nbit haphazard, there are multiple issues in the existing classes so some parts of this code just choose to use Any and # type: ignore pragmas.

esc commented 5 months ago

@sklam one idea I had was to use https://peps.python.org/pep-0622/ instead of generating the if-cascade? It might make it a bit simpler. What do you think? Or is it better to code generate programs with the smallest subset of the available syntax?

esc commented 5 months ago

@sklam the other thought I had was to surround all the introduced variables using dunder __. The special variables like __loop_cont__ and __sentinel__ use this convention, so maybe it would be good to be consistent here?

sklam commented 5 months ago

@sklam the other thought I had was to surround all the introduced variables using dunder __. The special variables like __loop_cont__ and __sentinel__ use this convention, so maybe it would be good to be consistent here?

Maybe even with a prefix like __scfg_<name>__, so that it's easy to find what names are introduced.

esc commented 5 months ago

@sklam the other thought I had was to surround all the introduced variables using dunder __. The special variables like __loop_cont__ and __sentinel__ use this convention, so maybe it would be good to be consistent here?

Maybe even with a prefix like __scfg_<name>__, so that it's easy to find what names are introduced.

That's a good idea, it would make it easy to then ban any input program that uses variables that begin with __scfg_, just as a precaution.

esc commented 5 months ago

@sklam the other thought I had was to surround all the introduced variables using dunder __. The special variables like __loop_cont__ and __sentinel__ use this convention, so maybe it would be good to be consistent here?

Maybe even with a prefix like __scfg_<name>__, so that it's easy to find what names are introduced.

That's a good idea, it would make it easy to then ban any input program that uses variables that begin with __scfg_, just as a precaution.

Done in 40b9daabcaa42c8de150758ee404a6ee36ae026d

esc commented 5 months ago

@sklam thank you for the review!