numba / numba

NumPy aware dynamic Python compiler using LLVM
https://numba.pydata.org/
BSD 2-Clause "Simplified" License
9.76k stars 1.12k forks source link

Source/AST Frontend for Numba #9674

Open esc opened 1 month ago

esc commented 1 month ago

This issue collects information for and plans towards creating a Source/AST frontend for Numba.

Context: Numba targets the cpython bytecode implementation since the source code may not always be available under all circumstances. Unfortunately this causes a significant maintenance burden for the project since the bytecode is not a stable interface designed to be targeted by third party applications. The Numba project currently requires about three to six person months every year to adapt to the latest bytecode semantics introduced with each annually released Python minor version bump implementation of cpython. Naturally, it would be nice for Numba to have a Source/AST frontend instead, as this would significantly reduce the maintenance burden.

esc commented 1 month ago

Summary of all no-source use-cases. (Comment will be edited).

esc commented 1 month ago

Links to useful utilities to consider. (Comment will be edited).

antocuni commented 1 month ago

Here is a possible way to augment exec in such a way that it remembers the source code:

import inspect
import linecache

def smart_exec(source, *args):
    if isinstance(source, str):
        n = smart_exec.n
        smart_exec.n += 1
        filename = f'<smart_exec {n}>'
        lines = [(x + "\n") for x in source.splitlines()]
        linecache.cache[filename] = (1, None, lines, filename)
        obj = compile(source, filename, 'exec')
    else:
        obj = source
    return exec(obj, *args)
smart_exec.n = 0

src = """
def foo(x, y):
    return x + y
"""

d = {}
#exec(src, d)       # source code not available
smart_exec(src, d)  # source code available
foo = d['foo']

print(inspect.getsource(foo))

The trick is that both inspect and traceback rely on linecache to get the source code.

It is used by the good old py lib, in particular for py.code.Source: https://github.com/pytest-dev/py/blob/master/py/_code/source.py#L193-L198

How to use it for numba, it's an open question. I suspect that you can either:

  1. monkey-patch builtins.exec and hope for the best
  2. provide numba.exec and require people to use it in case they want to exec() code which contains @numba.jit functions

Summary of all no-source use-cases. (Comment will be edited). [...] pickle

I'm curious: how do you end up in a no-source code situation with pickle?

seibert commented 1 month ago

Numba functions can be pickled (cloudpickle, to be precise) for remote execution (most commonly, with Dask), and will be re-compiled on the target system in case it does not match the client.

@sklam will have to remind me if there's a way to avoid this situation by always using LLVM IR, and if we are sure we don't ever have to go back to bytecode for compilation. I've got some vague recollection of possible issues with embedded symbol addresses, but maybe we've fixed those so caching works better as well.

antocuni commented 1 month ago

Numba functions can be pickled (cloudpickle, to be precise) for remote execution (most commonly, with Dask), and will be re-compiled on the target system in case it does not match the client.

ok but if the pickle comes from pickling an already-compiled numba function, then we have control on what goes into it, and we can "just" make to include all the data necessary for recompilation (e.g., the source code itself or some form of IR).

seibert commented 1 month ago

Yeah, I think that will cover the most common case. I don't know if anyone is applying the Numba decorator to functions after unpickling on the destination. That seems unlikely, but the user base is big enough that I don't know. 😅

antocuni commented 1 month ago

I don't know if anyone is applying the Numba decorator to functions after unpickling on the destination.

I think that this case is also covered, because pickled functions are just stored as a module.funcname pair, so the function must exist on the other side anyway. Example:

import pickle
import pickletools

def foo(x, y):
    pass

s = pickle.dumps(foo)
pickletools.dis(s)
$ python /tmp/pickletest.py 
    0: \x80 PROTO      4
    2: \x95 FRAME      20
   11: \x8c SHORT_BINUNICODE '__main__'
   21: \x94 MEMOIZE    (as 0)
   22: \x8c SHORT_BINUNICODE 'foo'
   27: \x94 MEMOIZE    (as 1)
   28: \x93 STACK_GLOBAL
   29: \x94 MEMOIZE    (as 2)
   30: .    STOP
highest protocol among opcodes = 4
sklam commented 1 month ago

I think that this case is also covered, because pickled functions are just stored as a module.funcname pair, so the function must exist on the other side anyway.

Dask uses cloudpickle which support loading from serialized bytecode if the host process determine the code object to be dynamic.

sklam commented 1 month ago

@sklam will have to remind me if there's a way to avoid this situation by always using LLVM IR, and if we are sure we don't ever have to go back to bytecode for compilation. I've got some vague recollection of possible issues with embedded symbol addresses, but maybe we've fixed those so caching works better as well.

It is doable to just use LLVM-IR or even machine code. It is the same problem as disk-cache. However, Numba currently only transfer the function bytecode with list of already compiled signatures. Recompilation from bytecode occurs on the unpickling machine.

stuartarchibald commented 1 month ago

@sklam will have to remind me if there's a way to avoid this situation by always using LLVM IR, and if we are sure we don't ever have to go back to bytecode for compilation. I've got some vague recollection of possible issues with embedded symbol addresses, but maybe we've fixed those so caching works better as well.

It is doable to just use LLVM-IR or even machine code. It is the same problem as disk-cache. However, Numba currently only transfer the function bytecode with list of already compiled signatures. Recompilation from bytecode occurs on the unpickling machine.

I'm not sure that it is quite the same problem as a local disk cache, because a cluster might be heterogeneous in architecture? Recompilation from LLVM IR can only really occur if the LLVM IR does not contain architectural details and is only likely to achieve effective performance if it hasn't already been optimised towards the details of some hardware (e.g. vector width). PIXIE hits these issues just within the same ISA.