pfalcon / pycopy

Pycopy - a minimalist and memory-efficient Python dialect. Good for desktop, cloud, constrained systems, microcontrollers, and just everything.
MIT License
792 stars 77 forks source link

RFC: Introducing "import- vs run- time" semantics mode to Python #26

Open pfalcon opened 5 years ago

pfalcon commented 5 years ago

One of the biggest (performance) issues with Python is (my term) overdynamicity - the fact that many symbols in a program a looked up at runtime by symbolic name. This includes: global variables and functions, module variables and functions, object attributes and methods. (Almost the only exception is that local function variables are optimized and accessed by "address" (more specifically, by offset in function stack frame)).

Such a semantics allows to override and customize many aspects of the language, but at the same time, leads to runtime inefficiency. But following are well-known facts:

  1. Majority of applications just never override symbols in other modules.
  2. Of those which do, majority do that once at the application startup (while "setting up application environment").
  3. Remaining would be quite specialized applications, either belonging to toolset (test runners, profilers, etc.) or applications which work around something instead of implementing/fixing properly.

Formalizing to Python semantics, following optimization approach can be proposed:

  1. During import time, a particular module can modify runtime environment (including overriding symbols in other modules).
  2. However, at runtime, such modifications are not allowed.
  3. These rules apply to all modules comprising a particular application recursively. I.e. there's a clear "import-time" phase vs runtime phases of application lifetime. Note that this rules out runtime imports (indeed, imports modify runtime environment, but it should be settled by the time when runtime phase starts).

Note also that "import time" is effectively corresponds to "compile time" in other languages. Indeed, cached bytecode files are produced during import phase, and they are produced by compiling source into the bytecode. But with conventional Python semantics, compiled bytecode has an implicit "module initialization function". That's required to allow both conventional semantics and modularity. For example, module init code can (and indeed, often does, per p.2 above) override symbols in other modules, so this has to be captured as imperative code. But the proposed new semantics effectively requires executing module init code during import time, and capturing effects of it. As effects can extend beyond the current module to the whole runtime environment, implementing the new semantics would require whole-program approach.

pfalcon commented 5 years ago

From the above, it's clear which constraints are put under the code:

  1. Any function and globals definitions should be done in module init code.
  2. Any class definitions should be done in module init code.
  3. Any overridings of symbols in other modules should happen in module init code.

Note that "globals" is particular case of module name space, "globals" are just namespace of current module, with "builtins" module fallback.

As an example, suppose we want to override builtin print(). Code not compliant with the proposed approach:

import builtins

def my_print(*args, **kwargs):

def install_my_print():
    builtins.print = my_print

Compliant code:

import builtins

def my_print(*args, **kwargs):

builtins.print = my_print
pfalcon commented 5 years ago

It should be noted which symbolic accesses can be optimized by this approach:

pfalcon commented 4 years ago

To clearly separate import-time from run-time, we'd need to add to implement a special kind of "main" function to call after import phase if over. Turns out, many good things like this were already considered, but some were rejected: "Special __main__() function in modules".

pfalcon commented 4 years ago