jart / cosmopolitan

build-once run-anywhere c library
ISC License
17.81k stars 610 forks source link

Compiling Python #141

Closed ahgamut closed 12 months ago

ahgamut commented 3 years ago

https://github.com/ahgamut/python27
https://github.com/ahgamut/cpython/tree/cosmo_py27

The assert macro needs to be changed in cosmopolitan.h to enable compilation (see #138). Afterwards, just clone the repo and run superconfigure.

Python 2.7.18 compiled seamlessly once I figured out how autoconf worked, and what flags were being fed to the source files when running make. I'm pretty sure we can compile any C-based extensions into python.exe -- they just need to compiled/linked with Cosmopolitan, with necessary glue code added to the Python source. For example, I was able to compile SQLite into python.exe to enable the internal _sqlite module.

The compiled APE is about 4.1MB with MODE=tiny (without any of the standard modules, the interpreter alone is around 1.6MB). Most of the modules in the stdlib compile without error. The _socketmodule (required for Python's simple HTTP server) doesn't compile, as it requires the structs from netdb.h.

On Windows, the APE exits immediately because the intertpreter is unable to find the platform-specific files. Module/getpath.c and Lib/site.py in the Python source try to use absolute paths from the prefixes provided during compilation; Editing those files to search the right locations (possibly with some zipos magic) ought to fix this.

jart commented 3 years ago

This is really exciting. Could you rebase your python27 repo on top of https://github.com/python/cpython so provenance is clearer and I can git diff exactly what you did? Alternatively, some kind of minimal build script showing the steps that are needed would be super helpful.

ahgamut commented 3 years ago

https://github.com/ahgamut/cpython/tree/cosmo_py27

Clone the repo and run superconfigure (superconfigure calls configure with the right params, then make and objcopy).

There are some minor details in the commit messages regarding what I tried to compile, etc.

ahgamut commented 3 years ago

Here's the sqlite fork that compiles with Cosmopolitan: https://github.com/ahgamut/sqlite/tree/cosmopolitan
Clone the repo and run superconfigure. It requires libtool and tcl8.6.

Changing the build process for SQLite was as follows:

sqlite compiles without any errors (only 1 warning). I haven't figured out how to run the tests yet.

Adding sqlite to the Python build requires the above compiled sqlite and adding the below recipe to Modules/Setup.local:

# example recipe with SQLite3 
# set variables to be used in Makefile

*static*
# location of compiled https://github.com/ahgamut/sqlite
SQLITE3_DIR=../sqlite

# if there are compile-time flags with an equals sign
# set them within a string, otherwise written wrongly into the Makefile
SQLITE3_OMIT_EXTFLAG='SQLITE_OMIT_LOAD_EXTENSION=1'
SQLITE3_MOD='MODULE_NAME="sqlite3"'

# order is (module, sources, includes/defines, link locations, linked libs)
# read Modules/Setup.dist for more details
_sqlite3 _sqlite/util.c _sqlite/connection.c _sqlite/cursor.c \
    _sqlite/microprotocols.c _sqlite/cache.c  _sqlite/prepare_protocol.c \
    _sqlite/row.c _sqlite/statement.c _sqlite/module.c \
    -D$(SQLITE3_OMIT_EXTFLAG) -D$(SQLITE3_MOD) \
    -IModules/_sqlite -I$(SQLITE3_DIR) \
    -L$(SQLITE3_DIR)/.libs -lsqlite3
ahgamut commented 3 years ago

The python.com APE now opens on Windows!

The interpreter couldn't find the standard library because the paths were coded in as absolute paths at compile time. I changed that to use relative paths (i.e. Lib in the same directory as the interpreter). Now one can just copy python.com and the Lib folder to the same directory in a Windows machine, and the APE will find site.py and start up properly. Later it might be nice to move some of the core modules as .pyc files into a ZIP as part the APE.

Right now, running python.com yourfile.py works on Windows, but the interpreter keeps throwing syntax errors in interactive mode. It may be related to this.

Question: does Cosmopolitan handle paths (forward slash on Linux, backslash on Windows) and environment variables (separated by : on Linux, ';' on Windows) correctly?

alisonatwork commented 3 years ago

Your syntax error in interactive mode might be similar to the errors I experienced in trying to get various shells to run under Windows. Specifically inside Windows console when you hit enter, it will send CRLF, but the interpreter is expecting only LF for end of line, so it interprets the CR as part of the statement. You can test this by hitting Ctrl-J instead of enter. If the syntax error goes away but your cursor ends up in a funny position, then that's the problem.

For path conversion, this is done inside mkntpath.c which should be called through the standard libc functions like stat and open, but you might find you are having a different problem with PYTHONPATH, which might parse the variable using colon as separator when compiled with Cosmopolitan. For libc functions that use PATH (e.g. execlp) this is handled in commandv.c, but that won't help for PYTHONPATH, or if Python includes its own path search logic. I'm not sure if Python dynamically sets the directory and path separators at runtime or if it's compiled in, but the ideal situation would be for it to determine these at runtime the same way that Cosmopolitan does, then everything should just work.

I haven't had much time to look at this project over the past few weeks, but if I do get back to it and find any Windows-specific quirks, I'll probably post over on #117 or open a PR. I expect any solutions will be similar for shells, Python interpreter etc.

ahgamut commented 3 years ago

The problem is with CRLF: I just tested statements terminating with Ctrl-J on Windows, and those are accepted (I still have to press Enter to run the statement, but at least the statement runs before showing invalid syntax).

PYTHONPATH isn't an issue because I unset it before running python.com.
Directory and path separators are set at compile time (DELIM and SEP in Include/osdefs.h).

Python needs to set sys.path (locations of import-able things), sys.prefix (location of platform-independent .py/.pyc files), and sys.exec_prefix (for shared libraries) to be able to import modules correctly. There are two separate sets of functions for the path search logic:

Both proceed similarly: check argv[0] to set the local directory, check environment variables (PATH, PYTHONPATH, PYTHONHOME, and some Windows registry stuff), try to find common locations for libraries from the directory of the executable, or finally fall back to the locations provided at compile time.

Right now, I've just changed the compile-time absolute path locations to relative paths, and it seems to work ok. Maybe after some reading I can customize Modules/getpath.c to have a IsWindows() check and change everything accordingly.

alisonatwork commented 3 years ago

It's really annoying that POSIX doesn't define a function to search PATH, it seems every shell just reimplements it for itself, and then libc has its own way again for execlp etc. I think something that might make our lives easier is publishing a Cosmopolitan-blessed path searcher function, so given an environment variable name (or buf with the value already in it), search each path for a file underneath it in a platform-agnostic way. Something like the SearchPath function in commandv.c, but which works for other variables, and where you can toggle search for executables or just search for any file. That might be able to replace some of the functions that Python, ash etc are trying to use to find the right file to load (or autocomplete, or whatever).

The CRLF thing is a trickier problem. Something you can try is using mintty (easiest way is from Git Bash) as your "terminal" instead of Windows console. I'm not sure, but that might avoid the generation of a CRLF when hitting enter, which would at least be a temporary workaround. Solving the problem inside console is more challenging. Personally I don't see it as very useful that carriage return is parsed as a non-whitespace token, even on UNIX. It seems to me the cleanest solution would be for Python to ignore CR, or treat it the same as a trailing space. That's the approach I went with trying to get ash to work, but I'm not sure if it will have unintended consequences for files with binary data in them.

alisonatwork commented 3 years ago

Maybe we could "polyfill" this one for UNIX: https://docs.microsoft.com/en-us/windows/win32/api/processenv/nf-processenv-searchpatha It could perhaps be used to handle finding things in the APE's ZIP filesystem too.

ahgamut commented 3 years ago

A quick list of the internal modules that can't be compiled yet (full list in the repo README):

_multiprocessing requires _save, which is a variable that is part of Py_BEGIN_ALLOW_THREADS.
Of the remaining modules, maybe dbm/gdbm would be useful to have.

ahgamut commented 3 years ago

The Python tokenizer now ignores CR when reading input. No more syntax errors when the APE runs in interactive mode on Windows!

niutech commented 3 years ago

I have built the APE binary of Python 2.7, you can download it here.

jart commented 3 years ago

We could generalize the commandv API but I suspect one of the reasons why POSIX doesn't do that already is that PATH searching is such an expensive operation (in terms of system calls and disk seeks) and shells usually implement it on their own because they're able to perform local optimizations (like memoization) that the C library isn't able to do. For example, sometimes when using the bash shell, I'll need to occasionally run hash -r to let it know to recompute PATH after it's been changed.

ahgamut commented 3 years ago

Maybe just skip looking through environment variables when starting up python.com? The interpreter looks through PATH (and also PYTHONPATH, PYTHONHOME) only because it needs to find the necessary standard library modules and .so/.dll shared libs. But that can be changed entirely or skipped: in this commit I've commented out the search for PYTHONPATH and PYTHONHOME, and I'm already using relative paths for the directories.

niutech commented 3 years ago

As a workaround for Cosmopolitan Python, you can build the APE version of the latest Wasm3 and run Rust Python WebAssembly interpreter in it:

$ ./wasm3.com --stack-size 1000000 rustpython.wasm 
Welcome to the magnificent Rust Python 0.1.1 interpreter 😱 🖖
>>>>> 
ahgamut commented 3 years ago

I think getting the _socket module to build is the only major thing left. It would enable testing the stdlib, and I could get started on a PR for third_party/python2.

I had a look at the netdb.h implementation in the musl source code:

@jart do you (plan to) have an implementation of the above functions in cosmopolitan? I tried to add just getnameinfo to third_party/musl, but then its internal dependencies added a bunch of other files, so I thought I'd check.

jart commented 3 years ago

Contributions are welcome on getnameinfo. I'd write it from scratch rather than using the Musl code. We already have getaddrinfo so implementing getnameinfo would be almost the same thing, except you send the DNS request to aaa.bbb.ccc.ddd.in-addr.arpa. and parse the returned PTR record.

niutech commented 3 years ago

You can find the Python 2.7 APE binary in awesome-cosmo.

ahgamut commented 3 years ago

Until now, python.com required the standard library to be in a nearby folder.

I added the standard library to the internal ZIP store, and added the location of the APE as the first entry in sys.path. The python.com APE is now self-contained! (tested on Debian Linux and Windows 10)

2021-06-19_03-22-49_1363x257

ahgamut commented 3 years ago

Now all the functions related to the _socket module have been implemented (#172, #196, #200, #204, #207, and #209 -- thanks @jart for guidance!), I can:

(Edit Not yet there on windows because _socket has some complaints)

https://github.com/ahgamut/cpython/tree/cosmo_py36 also works. Python 3.6.14 has another 5 months before EOL though.

ahgamut commented 3 years ago

@jart I was trying to python.com -m SimpleHTTPServer and it was failing with an Errno 10042 [ENOPROTOOPT] on Windows

SO_REUSEADDR is defined as 0 for Windows in libc/sysv/consts.sh, should it be 1 instead?

The Win32 API docs say that:

SO_REUSEADDR: BOOL Allows the socket to be bound to an address that is already in use. For more information, see bind. Not applicable on ATM sockets.

I changed the setsockopt call to setsockopt(fd, SOL_SOCKET, IsWindows() ? 1 : SO_REUSEADDR, buf, sizeof(buf)) and I am able to run SimpleHTTPServer without error.

ahgamut commented 3 years ago

With the above SO_REUSEADDR fix, it is possible to serve static pages locally on Windows, using python.com -m SimpleHTTPServer.

It is also possible to serve dynamic pages: just download Flask and its pure-python dependencies as wheels, and unzip the wheels into the APE at python.com/Lib/site-packages. Here's a GIF of a simple Flask webapp that runs with such a python.com:

recon

Tested on Windows 10 and Debian Linux. I wrote a summary of the changes made for the Python APE here.

ahgamut commented 3 years ago

@jart @pkulchenko I would like to know if it possible to use MbedTLS for the SSL support required in Python. Does MbedTLS have Python bindings? I don't think MbedTLS is a drop-in replacement for OpenSSL (like BoringSSL), is there is a list of equivalent functions somewhere?

python.com -m pip download/install <package-name> requires SSL support: the _ssl and _hashlib modules in stdlib needs to be compiled into the APE. Without SSL support, one needs to download all the necessary wheels locally before installing them with python.com -m pip install pkg.whl -t some_dir.

The _ssl and _hashlib modules are implemented with OpenSSL for both Python 2.7 and Python 3.6.
It is possible to compile OpenSSL 1.1.1k with Cosmopolitan, by providing the right flags and a few minor changes to the source code.

Compiling everything with -Os and using the MODE=tiny cosmopolitan.a, we get:

component size
APE + most C stdlib extensions 2.6 MB
unicodedata + CJK/multibytecodecs 1.6 MB
python stdlib as .pyc files1 2.6 MB
pip + setuptools as .pyc files 2.0 MB
total without SSL 8.8 MB
_ssl + _hashlib via OpenSSL 2.2 MB
total with OpenSSL 11 MB

1 this is by ignoring failing libraries like asyncio, tkinter, turtle.py, .exe files, some platform-specific stuff, etc. I imagine it's possible to reduce further if size is really an issue.

pkulchenko commented 3 years ago

@jart @pkulchenko I would like to know if it possible to use MbedTLS for the SSL support required in Python. Does MbedTLS have Python bindings?

@ahgamut, I did find the same library, as you already referenced, but I can't comment on the rest, as I haven't used it.

ahgamut commented 3 years ago

@jart here's a quick summary of stuff to be examined further:

jart commented 3 years ago

I'm still excited about porting Python and now have time available to help.

python27.com or python36.com? 3.6 has more features and is EOL'd only at the end of this year

Python3000 can no longer be safely ignored. I'd recommend we just do that unless there's big blockers. Or both. It'll make people unhappy if we publish only Python2. Speaking of which, I've decided that I do want to start distributing "Actually Portable Python". I've mentioned before, language authors should ideally incorporate Cosmo into their official release processes. Until that happens, we can demonstrate the demand exists by distributing ourselves.

MbedTLS

If you got OpenSSL to build then I'd say stick with that. I chose MbedTLS for redbean because I wanted something tinier and I wouldn't agree to the OpenSSL license. However Python is already huge and it appears OpenSSL finally fixed its license. So it looks good to me. I'd even support checking-in both OpenSSL and Python3 to third party.

kissgyorgy commented 3 years ago

FYI: Anything less than Python 3.5 are EOL now: https://devguide.python.org/#status-of-python-branches
I suggest targeting Python 3.6+, as you probably won't able to get help or any support for Python core developers for older versions.

ahgamut commented 3 years ago

@kissgyorgy

https://github.com/ahgamut/cpython/tree/cosmo_py27 -- Python 2.7.18 is EOL of course https://github.com/ahgamut/cpython/tree/cosmo_py36 -- Python 3.6.14 is supported till the end of this calendar year

Python 3.7 and above cannot be built without threads.

Both the above repos are roughly equal in terms of modifications/functionality. Python2.7 is easier to debug because Python3 is unicode by default, which raises complaints with locale/encoding stuff. I'm running the tests for Python 3.6 right now.

@jart I thought it would be better to add Python to the repo after passing as many tests as possible, like it was done in #61.

The Python 3.6.14 source contains a couple of its external dependencies: mpdecimal and libffi (for _ctypes). Split them into separate packages (third_party/libmpdecimal) or keep them within third_party/python36?

ahgamut commented 3 years ago

Okay the latest commit to cosmo_py36 adds a shell script to run a selected list of tests from the Python 3.6.14 test suite.

The test results differ if python.com is used instead of python.com.dbg. Maybe the APE ZIP store imports are interfering with the testing?

Raised the number of passing tests by commenting out tests related to handling various unicode encodings (the Cosmoplitan build currently supports only UTF-8).

jart commented 3 years ago

75% of applicable tests passing sounds pretty good considering the maturity of the language. I'm not too concerned about Python 3.6 vs. 3.9.

jart commented 3 years ago

some form of tree-shaking for the Python stdlib so that APEs can be smaller

That'd be nice considering modules like unicodedata alone is 17% of the binary. We can always create a really good build config as a stopgap. If we can make the assumption that modules have a single root and don't do sneak imports then it should be relatively straightforward to modify tool/build/zip.c to encode Python module deps into the ELF linkage and then maybe add a few STATIC_IMPORT statements here and there.

ahgamut commented 3 years ago

I am not sure how many testcases are failing because of an indirect dependency on threads. If only those particular tests are skipped I think the APE would have a higher pass rate on the test suite.

Also, using the test module for Python 3.6 APE does different things that just running the tests outright.

python.com -m test -W test_string_literals fails with the latin9 encoding, but python.com Lib/test/test_string_literals.py passes.

ahgamut commented 3 years ago

@jart here's the current performance of python.com on the Python test suite.

Use: the cosmo_py36 repo for building python.com. (The APE built in the cosmopolitan repo has some runtime issues)

Run at root of directory: ./python.com -E -Wd -m test -x test_json test_subprocess (two tests excluded because crashing aborts).

Including the two above, there are 407 tests. Current pass rate: 290/345 = 84%.

    test_aifc test_audioop test_cmath test_cmd_line
    test_cmd_line_script test_code test_codeccallbacks
    test_codecencodings_cn test_codecencodings_hk
    test_codecencodings_iso2022 test_codecencodings_jp
    test_codecencodings_kr test_codecencodings_tw test_codecs
    test_compileall test_datetime test_descr test_distutils test_email
    test_epoll test_fileinput test_gdb test_getpass test_httplib
    test_imp test_import test_importlib test_inspect test_io
    test_lib2to3 test_locale test_logging test_mailbox test_math
    test_mmap test_multibytecodec test_nntplib test_openpty test_os
    test_pathlib test_pdb test_pickle test_pickletools test_plistlib
    test_posix test_pyclbr test_pydoc test_regrtest test_repl
    test_reprlib test_runpy test_sax test_selectors test_signal
    test_smtpd test_smtplib test_sndhdr test_socket
    test_source_encoding test_string_literals test_strptime test_sunau
    test_support test_sys test_sys_settrace test_time test_trace
    test_ucn test_unicode test_urllib test_urllib2 test_venv test_wave
    test_xml_etree test_xml_etree_c test_zipfile test_json test_subprocess

Of these 78 failures:

I'll post the exact error logs if necessary.

ahgamut commented 3 years ago

(The APE built in the cosmopolitan repo has some runtime issues).

By this I mean that running python.com -m test from inside the cosmopolitan repo fails because os.WNOHANG is missing: this is likely because some headers are not included (libc/sysv/consts/w.h). This may be occurring in other files apart from Modules/posixmodule.c, I'll submit a PR with a list of changes.

Keithcat1 commented 3 years ago

When I run python.com on Windows, it breaks my ability to review already typed commands using the up and down arrows (like #199 but only while python.com is running. I think the _ssl builtin module is missing. I guess Redbean's SSL support could be used for this?

ahgamut commented 3 years ago

@Keithcat1 I think the "up-arrow-to-view-previous-line" requires readline+(terminfo/curses) to be compiled along with python.com, otherwise you get stuff like [[A when you press Up in the REPL.

I've tried adding readline+curses to Python2.7 (see this commit), but curses fails at the linker stage for some reason. readline+terminfo gets compiled but the required functionality isn't there yet.

jart commented 3 years ago

@Keithcat1 The Python shell should now work perfectly on Windows. See 5029e20befb339507bf1d2e57c2efe5d5175c57f

Contributions welcome on getting Python SSL to work. In the meantime you might consider using redbean as your SSL frontend. You can use the Fetch() Lua API to reverse proxy requests to your APE Python binary.

ahgamut commented 3 years ago

@jart as mentioned in #235, using .pyc files in the ZIP store halves the APE startup time.

When running python.com -Svc "2+2", most of the time is in loading the Python modules (.py/.pyc). Importing all the compiled C extension modules only takes around 10% percent of the overall time.

There has to be a way to load the Python modules faster, hopefully by avoiding some of the file-searching/C-Python-indirection involved. I don't think it's possible to do the entire importing within C, due to how _frozen_importlib works.

I'm trying to see what can be done with sys.path_hooks and/or sys.meta_path.

jart commented 3 years ago

Yes using those .pyc files is going to have a huge impact, since it takes Python about a millisecond to load each .py. I created a fastpython branch in 214e3a68a9358b2ee7dc0cb27a63c06256979317 and run make -j8 o//tool/build/deltaify.com o//third_party/python/python.com && o//third_party/python/python.com -sSBc 'print(2+2)' |& o//tool/build/deltaify.com here's what shows up:

               1
               7 __zipos_opendir("zip!.python/")
               1 __zipos_stat(".python/encodings/__init__.cpython36m-x86_64-cosmo.so") → enoent
               0 __zipos_stat(".python/encodings/__init__.abi3.so") → enoent
               0 __zipos_stat(".python/encodings/__init__.so") → enoent
               0 __zipos_open(".python/encodings/__pycache__/__init__.cpython-36.pyc") enoent
               0 __zipos_open(".python/encodings/__init__.py")
               0 __zipos_fstat(".python/encodings/__init__.py")
               0 __zipos_lseek(".python/encodings/__init__.py", 0)
               1 __zipos_fstat(".python/encodings/__init__.py")
               0 __zipos_read(".python/encodings/__init__.py", cap=5643, off=0) → got=5642
               8 __zipos_read(".python/encodings/__init__.py", cap=1, off=5642) → got=0
            1821 __zipos_close(".python/encodings/__init__.py")
              77 __zipos_open(".python/__pycache__/codecs.cpython-36.pyc") enoent
               2 __zipos_open(".python/codecs.py")
               0 __zipos_fstat(".python/codecs.py")
               0 __zipos_lseek(".python/codecs.py", 0)
               0 __zipos_fstat(".python/codecs.py")
               0 __zipos_read(".python/codecs.py", cap=36277, off=0) → got=36276
               0 __zipos_read(".python/codecs.py", cap=1, off=36276) → got=0
            2615 __zipos_close(".python/codecs.py")
             126 __zipos_opendir("zip!.python/encodings")
               3 __zipos_open(".python/encodings/__pycache__/aliases.cpython-36.pyc") enoent
               1 __zipos_open(".python/encodings/aliases.py")
               3 __zipos_fstat(".python/encodings/aliases.py")
               0 __zipos_lseek(".python/encodings/aliases.py", 0)
               0 __zipos_fstat(".python/encodings/aliases.py")
               0 __zipos_read(".python/encodings/aliases.py", cap=15578, off=0) → got=15577
               8 __zipos_read(".python/encodings/aliases.py", cap=1, off=15577) → got=0
             905 __zipos_close(".python/encodings/aliases.py")
               2 __zipos_open(".python/encodings/__pycache__/utf_8.cpython-36.pyc") enoent
               0 __zipos_open(".python/encodings/utf_8.py")
               0 __zipos_fstat(".python/encodings/utf_8.py")
               0 __zipos_lseek(".python/encodings/utf_8.py", 0)
               0 __zipos_fstat(".python/encodings/utf_8.py")
               0 __zipos_read(".python/encodings/utf_8.py", cap=1006, off=0) → got=1005
               0 __zipos_read(".python/encodings/utf_8.py", cap=1, off=1005) → got=0
             357 __zipos_close(".python/encodings/utf_8.py")
               2 __zipos_open(".python/encodings/__pycache__/latin_1.cpython-36.pyc") enoent
               0 __zipos_open(".python/encodings/latin_1.py")
               0 __zipos_fstat(".python/encodings/latin_1.py")
               0 __zipos_lseek(".python/encodings/latin_1.py", 0)
               0 __zipos_fstat(".python/encodings/latin_1.py")
               0 __zipos_read(".python/encodings/latin_1.py", cap=1265, off=0) → got=1264
               0 __zipos_read(".python/encodings/latin_1.py", cap=1, off=1264) → got=0
             328 __zipos_close(".python/encodings/latin_1.py")
               2 __zipos_open(".python/__pycache__/io.cpython-36.pyc") enoent
               0 __zipos_open(".python/io.py")
               0 __zipos_fstat(".python/io.py")
               0 __zipos_lseek(".python/io.py", 0)
               0 __zipos_fstat(".python/io.py")
               0 __zipos_read(".python/io.py", cap=3518, off=0) → got=3517
               1 __zipos_read(".python/io.py", cap=1, off=3517) → got=0
             288 __zipos_close(".python/io.py")
               2 __zipos_open(".python/__pycache__/abc.cpython-36.pyc") enoent
               1 __zipos_open(".python/abc.py")
               0 __zipos_fstat(".python/abc.py")
               0 __zipos_lseek(".python/abc.py", 0)
               0 __zipos_fstat(".python/abc.py")
               0 __zipos_read(".python/abc.py", cap=8728, off=0) → got=8727
               0 __zipos_read(".python/abc.py", cap=1, off=8727) → got=0
             635 __zipos_close(".python/abc.py")
               2 __zipos_open(".python/__pycache__/_weakrefset.cpython-36.pyc") enoent
               0 __zipos_open(".python/_weakrefset.py")
               0 __zipos_fstat(".python/_weakrefset.py")
               0 __zipos_lseek(".python/_weakrefset.py", 0)
               0 __zipos_fstat(".python/_weakrefset.py")
               0 __zipos_read(".python/_weakrefset.py", cap=5706, off=0) → got=5705
               0 __zipos_read(".python/_weakrefset.py", cap=1, off=5705) → got=0
            1127 __zipos_close(".python/_weakrefset.py")
               2 __zipos_open(".python/__pycache__/_bootlocale.cpython-36.pyc") enoent
               0 __zipos_open(".python/_bootlocale.py")
               0 __zipos_fstat(".python/_bootlocale.py")
               0 __zipos_lseek(".python/_bootlocale.py", 0)
               0 __zipos_fstat(".python/_bootlocale.py")
               0 __zipos_read(".python/_bootlocale.py", cap=1313, off=0) → got=1312
               0 __zipos_read(".python/_bootlocale.py", cap=1, off=1312) → got=0
             241 __zipos_close(".python/_bootlocale.py")
             119 __zipos_open(".python/__pycache__/locale.cpython-36.pyc") enoent
               2 __zipos_open(".python/locale.py")
               0 __zipos_fstat(".python/locale.py")
               0 __zipos_lseek(".python/locale.py", 0)
               0 __zipos_fstat(".python/locale.py")
               0 __zipos_read(".python/locale.py", cap=77301, off=0) → got=77300
               0 __zipos_read(".python/locale.py", cap=1, off=77300) → got=0
            4265 __zipos_close(".python/locale.py")
              53 __zipos_open(".python/__pycache__/re.cpython-36.pyc") enoent
               2 __zipos_open(".python/re.py")
               0 __zipos_fstat(".python/re.py")
               0 __zipos_lseek(".python/re.py", 0)
               0 __zipos_fstat(".python/re.py")
               0 __zipos_read(".python/re.py", cap=15553, off=0) → got=15552
               0 __zipos_read(".python/re.py", cap=1, off=15552) → got=0
             920 __zipos_close(".python/re.py")
              70 __zipos_open(".python/__pycache__/enum.cpython-36.pyc") enoent
               2 __zipos_open(".python/enum.py")
               0 __zipos_fstat(".python/enum.py")
               0 __zipos_lseek(".python/enum.py", 0)
               0 __zipos_fstat(".python/enum.py")
               0 __zipos_read(".python/enum.py", cap=33607, off=0) → got=33606
               0 __zipos_read(".python/enum.py", cap=1, off=33606) → got=0
            2856 __zipos_close(".python/enum.py")
               5 __zipos_open(".python/__pycache__/types.cpython-36.pyc") enoent
               1 __zipos_open(".python/types.py")
               0 __zipos_fstat(".python/types.py")
               0 __zipos_lseek(".python/types.py", 0)
               0 __zipos_fstat(".python/types.py")
               0 __zipos_read(".python/types.py", cap=8871, off=0) → got=8870
               8 __zipos_read(".python/types.py", cap=1, off=8870) → got=0
             975 __zipos_close(".python/types.py")
             124 __zipos_open(".python/__pycache__/functools.cpython-36.pyc") enoent
               2 __zipos_open(".python/functools.py")
               0 __zipos_fstat(".python/functools.py")
               0 __zipos_lseek(".python/functools.py", 0)
               0 __zipos_fstat(".python/functools.py")
               0 __zipos_read(".python/functools.py", cap=31347, off=0) → got=31346
               0 __zipos_read(".python/functools.py", cap=1, off=31346) → got=0
            2936 __zipos_close(".python/functools.py")
               2 __zipos_stat(".python/collections/__init__.cpython36m-x86_64-cosmo.so") → enoent
               0 __zipos_stat(".python/collections/__init__.abi3.so") → enoent
              60 __zipos_stat(".python/collections/__init__.so") → enoent
              77 __zipos_open(".python/collections/__pycache__/__init__.cpython-36.pyc") enoent
               1 __zipos_open(".python/collections/__init__.py")
               0 __zipos_fstat(".python/collections/__init__.py")
               2 __zipos_lseek(".python/collections/__init__.py", 0)
               0 __zipos_fstat(".python/collections/__init__.py")
               0 __zipos_read(".python/collections/__init__.py", cap=45813, off=0) → got=45812
               0 __zipos_read(".python/collections/__init__.py", cap=1, off=45812) → got=0
            4174 __zipos_close(".python/collections/__init__.py")
              50 __zipos_open(".python/__pycache__/_collections_abc.cpython-36.pyc") enoent
               2 __zipos_open(".python/_collections_abc.py")
               0 __zipos_fstat(".python/_collections_abc.py")
               0 __zipos_lseek(".python/_collections_abc.py", 0)
               0 __zipos_fstat(".python/_collections_abc.py")
               0 __zipos_read(".python/_collections_abc.py", cap=26393, off=0) → got=26392
               0 __zipos_read(".python/_collections_abc.py", cap=1, off=26392) → got=0
            3573 __zipos_close(".python/_collections_abc.py")
               7 __zipos_open(".python/__pycache__/operator.cpython-36.pyc") enoent
               0 __zipos_open(".python/operator.py")
               0 __zipos_fstat(".python/operator.py")
               0 __zipos_lseek(".python/operator.py", 0)
               0 __zipos_fstat(".python/operator.py")
               0 __zipos_read(".python/operator.py", cap=10864, off=0) → got=10863
               8 __zipos_read(".python/operator.py", cap=1, off=10863) → got=0
            1511 __zipos_close(".python/operator.py")
               2 __zipos_open(".python/__pycache__/keyword.cpython-36.pyc") enoent
               0 __zipos_open(".python/keyword.py")
               0 __zipos_fstat(".python/keyword.py")
               0 __zipos_lseek(".python/keyword.py", 0)
               0 __zipos_fstat(".python/keyword.py")
               0 __zipos_read(".python/keyword.py", cap=2212, off=0) → got=2211
               0 __zipos_read(".python/keyword.py", cap=1, off=2211) → got=0
             340 __zipos_close(".python/keyword.py")
              58 __zipos_open(".python/__pycache__/heapq.cpython-36.pyc") enoent
               2 __zipos_open(".python/heapq.py")
               0 __zipos_fstat(".python/heapq.py")
               0 __zipos_lseek(".python/heapq.py", 0)
               0 __zipos_fstat(".python/heapq.py")
               0 __zipos_read(".python/heapq.py", cap=22930, off=0) → got=22929
               0 __zipos_read(".python/heapq.py", cap=1, off=22929) → got=0
            1499 __zipos_close(".python/heapq.py")
               2 __zipos_open(".python/__pycache__/reprlib.cpython-36.pyc") enoent
               0 __zipos_open(".python/reprlib.py")
               0 __zipos_fstat(".python/reprlib.py")
               0 __zipos_lseek(".python/reprlib.py", 0)
               1 __zipos_fstat(".python/reprlib.py")
               0 __zipos_read(".python/reprlib.py", cap=5337, off=0) → got=5336
               0 __zipos_read(".python/reprlib.py", cap=1, off=5336) → got=0
             836 __zipos_close(".python/reprlib.py")
               2 __zipos_open(".python/__pycache__/_dummy_thread.cpython-36.pyc") enoent
               0 __zipos_open(".python/_dummy_thread.py")
               0 __zipos_fstat(".python/_dummy_thread.py")
               0 __zipos_lseek(".python/_dummy_thread.py", 0)
               0 __zipos_fstat(".python/_dummy_thread.py")
               0 __zipos_read(".python/_dummy_thread.py", cap=5168, off=0) → got=5167
               0 __zipos_read(".python/_dummy_thread.py", cap=1, off=5167) → got=0
             899 __zipos_close(".python/_dummy_thread.py")
              48 __zipos_open(".python/__pycache__/weakref.cpython-36.pyc") enoent
               2 __zipos_open(".python/weakref.py")
               0 __zipos_fstat(".python/weakref.py")
               0 __zipos_lseek(".python/weakref.py", 0)
               0 __zipos_fstat(".python/weakref.py")
               0 __zipos_read(".python/weakref.py", cap=20467, off=0) → got=20466
               0 __zipos_read(".python/weakref.py", cap=1, off=20466) → got=0
            2505 __zipos_close(".python/weakref.py")
              64 __zipos_opendir("zip!.python/collections")
               2 __zipos_open(".python/collections/__pycache__/abc.cpython-36.pyc") enoent
               0 __zipos_open(".python/collections/abc.py")
               0 __zipos_fstat(".python/collections/abc.py")
               0 __zipos_lseek(".python/collections/abc.py", 0)
               0 __zipos_fstat(".python/collections/abc.py")
               0 __zipos_read(".python/collections/abc.py", cap=69, off=0) → got=68
               0 __zipos_read(".python/collections/abc.py", cap=1, off=68) → got=0
             526 __zipos_close(".python/collections/abc.py")
              50 __zipos_open(".python/__pycache__/sre_compile.cpython-36.pyc") enoent
               2 __zipos_open(".python/sre_compile.py")
               0 __zipos_fstat(".python/sre_compile.py")
               0 __zipos_lseek(".python/sre_compile.py", 0)
               0 __zipos_fstat(".python/sre_compile.py")
               0 __zipos_read(".python/sre_compile.py", cap=19339, off=0) → got=19338
               0 __zipos_read(".python/sre_compile.py", cap=1, off=19338) → got=0
            2168 __zipos_close(".python/sre_compile.py")
              68 __zipos_open(".python/__pycache__/sre_parse.cpython-36.pyc") enoent
               2 __zipos_open(".python/sre_parse.py")
               0 __zipos_fstat(".python/sre_parse.py")
               0 __zipos_lseek(".python/sre_parse.py", 0)
               0 __zipos_fstat(".python/sre_parse.py")
               0 __zipos_read(".python/sre_parse.py", cap=36537, off=0) → got=36536
               0 __zipos_read(".python/sre_parse.py", cap=1, off=36536) → got=0
            3980 __zipos_close(".python/sre_parse.py")
               3 __zipos_open(".python/__pycache__/sre_constants.cpython-36.pyc") enoent
               0 __zipos_open(".python/sre_constants.py")
               0 __zipos_fstat(".python/sre_constants.py")
               0 __zipos_lseek(".python/sre_constants.py", 0)
               0 __zipos_fstat(".python/sre_constants.py")
               0 __zipos_read(".python/sre_constants.py", cap=6822, off=0) → got=6821
               0 __zipos_read(".python/sre_constants.py", cap=1, off=6821) → got=0
            1151 __zipos_close(".python/sre_constants.py")
               2 __zipos_open(".python/__pycache__/copyreg.cpython-36.pyc") enoent
               0 __zipos_open(".python/copyreg.py")
               1 __zipos_fstat(".python/copyreg.py")
               0 __zipos_lseek(".python/copyreg.py", 0)
               0 __zipos_fstat(".python/copyreg.py")
               0 __zipos_read(".python/copyreg.py", cap=7008, off=0) → got=7007
               0 __zipos_read(".python/copyreg.py", cap=1, off=7007) → got=0
             984 __zipos_close(".python/copyreg.py")
              87 __zipos_open(".python/__pycache__/os.cpython-36.pyc") enoent
               2 __zipos_open(".python/os.py")
               0 __zipos_fstat(".python/os.py")
               0 __zipos_lseek(".python/os.py", 0)
               0 __zipos_fstat(".python/os.py")
               0 __zipos_read(".python/os.py", cap=37527, off=0) → got=37526
               0 __zipos_read(".python/os.py", cap=1, off=37526) → got=0
            2931 __zipos_close(".python/os.py")
               2 __zipos_open(".python/__pycache__/stat.cpython-36.pyc") enoent
               0 __zipos_open(".python/stat.py")
               0 __zipos_fstat(".python/stat.py")
               1 __zipos_lseek(".python/stat.py", 0)
               0 __zipos_fstat(".python/stat.py")
               0 __zipos_read(".python/stat.py", cap=5039, off=0) → got=5038
               0 __zipos_read(".python/stat.py", cap=1, off=5038) → got=0
             623 __zipos_close(".python/stat.py")
              49 __zipos_open(".python/__pycache__/posixpath.cpython-36.pyc") enoent
               2 __zipos_open(".python/posixpath.py")
               0 __zipos_fstat(".python/posixpath.py")
               0 __zipos_lseek(".python/posixpath.py", 0)
               0 __zipos_fstat(".python/posixpath.py")
               2 __zipos_read(".python/posixpath.py", cap=15773, off=0) → got=15772
               0 __zipos_read(".python/posixpath.py", cap=1, off=15772) → got=0
            1561 __zipos_close(".python/posixpath.py")
               2 __zipos_open(".python/__pycache__/genericpath.cpython-36.pyc") enoent
               0 __zipos_open(".python/genericpath.py")
               0 __zipos_fstat(".python/genericpath.py")
               0 __zipos_lseek(".python/genericpath.py", 0)
               0 __zipos_fstat(".python/genericpath.py")
               0 __zipos_read(".python/genericpath.py", cap=4757, off=0) → got=4756
               1 __zipos_read(".python/genericpath.py", cap=1, off=4756) → got=0
             716 __zipos_close(".python/genericpath.py")
            1625 4

If you do it with the verbose flag you mentioned:

               1
               8 import _frozen_importlib # frozen
               1 import _imp # builtin
               0 import sys # builtin
               0 import '_warnings' # <class '_frozen_importlib.BuiltinImporter'>
             263 import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
             207 import '_frozen_importlib_external' # <class '_frozen_importlib.FrozenImporter'>
               3 import '_io' # <class '_frozen_importlib.BuiltinImporter'>
             153 import 'marshal' # <class '_frozen_importlib.BuiltinImporter'>
               3 import 'posix' # <class '_frozen_importlib.BuiltinImporter'>
               0 import _weakref # previously loaded ('_weakref')
            1995 import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
            2609 # code object from zip!.python/encodings/__init__.py
              57 # code object from zip!.python/codecs.py
             127 import '_codecs' # <class '_frozen_importlib.BuiltinImporter'>
             903 import 'codecs' # <_frozen_importlib_external.SourceFileLoader object at 0x100080a5d208>
               3 # code object from zip!.python/encodings/aliases.py
              16 import 'encodings.aliases' # <_frozen_importlib_external.SourceFileLoader object at 0x100080a5def0>
             327 import 'encodings' # <_frozen_importlib_external.SourceFileLoader object at 0x1000800e4518>
              63 # code object from zip!.python/encodings/utf_8.py
              75 import 'encodings.utf_8' # <_frozen_importlib_external.SourceFileLoader object at 0x100080b83ac8>
             221 import '_signal' # <class '_frozen_importlib.BuiltinImporter'>
             119 # code object from zip!.python/encodings/latin_1.py
             306 import 'encodings.latin_1' # <_frozen_importlib_external.SourceFileLoader object at 0x100080b5c780>
             637 # code object from zip!.python/io.py
             807 # code object from zip!.python/abc.py
              50 # code object from zip!.python/_weakrefset.py
              73 import '_weakrefset' # <_frozen_importlib_external.SourceFileLoader object at 0x100080b62208>
             158 import 'abc' # <_frozen_importlib_external.SourceFileLoader object at 0x100080b5d5c0>
             203 import 'io' # <_frozen_importlib_external.SourceFileLoader object at 0x100080b5cc50>
              49 # code object from zip!.python/_bootlocale.py
               1 import '_locale' # <class '_frozen_importlib.BuiltinImporter'>
            4446 import '_bootlocale' # <_frozen_importlib_external.SourceFileLoader object at 0x100080b4cc50>
             979 # code object from zip!.python/locale.py
            2937 # code object from zip!.python/re.py
             953 # code object from zip!.python/enum.py
            3080 # code object from zip!.python/types.py
              50 # code object from zip!.python/functools.py
            4251 import '_functools' # <class '_frozen_importlib.BuiltinImporter'>
            2787 # code object from zip!.python/collections/__init__.py
             818 # code object from zip!.python/_collections_abc.py
            1482 import '_collections_abc' # <_frozen_importlib_external.SourceFileLoader object at 0x100080b717b8>
             102 # code object from zip!.python/operator.py
               3 import '_operator' # <class '_frozen_importlib.BuiltinImporter'>
             357 import 'operator' # <_frozen_importlib_external.SourceFileLoader object at 0x100080af4b70>
               3 # code object from zip!.python/keyword.py
            1413 import 'keyword' # <_frozen_importlib_external.SourceFileLoader object at 0x100080c54048>
              85 # code object from zip!.python/heapq.py
               3 import '_heapq' # <class '_frozen_importlib.BuiltinImporter'>
             102 import 'heapq' # <_frozen_importlib_external.SourceFileLoader object at 0x100080c54128>
             808 import 'itertools' # <class '_frozen_importlib.BuiltinImporter'>
             448 # code object from zip!.python/reprlib.py
               3 # code object from zip!.python/_dummy_thread.py
              58 import '_dummy_thread' # <_frozen_importlib_external.SourceFileLoader object at 0x100080c685f8>
              57 import 'reprlib' # <_frozen_importlib_external.SourceFileLoader object at 0x100080ac6be0>
             376 import '_collections' # <class '_frozen_importlib.BuiltinImporter'>
            2091 import 'collections' # <_frozen_importlib_external.SourceFileLoader object at 0x100080c445c0>
             153 # code object from zip!.python/weakref.py
             385 import 'weakref' # <_frozen_importlib_external.SourceFileLoader object at 0x100080b71b38>
             117 import 'functools' # <_frozen_importlib_external.SourceFileLoader object at 0x100080d8d320>
               3 # code object from zip!.python/collections/abc.py
              52 import 'collections.abc' # <_frozen_importlib_external.SourceFileLoader object at 0x100080c441d0>
             379 import 'types' # <_frozen_importlib_external.SourceFileLoader object at 0x100080d7f0b8>
            2176 import 'enum' # <_frozen_importlib_external.SourceFileLoader object at 0x100080d655c0>
              62 # code object from zip!.python/sre_compile.py
            3974 import '_sre' # <class '_frozen_importlib.BuiltinImporter'>
             620 # code object from zip!.python/sre_parse.py
             106 # code object from zip!.python/sre_constants.py
              84 import 'sre_constants' # <_frozen_importlib_external.SourceFileLoader object at 0x100080e2aac8>
              52 import 'sre_parse' # <_frozen_importlib_external.SourceFileLoader object at 0x100080c81518>
             868 import 'sre_compile' # <_frozen_importlib_external.SourceFileLoader object at 0x100080c442b0>
               3 # code object from zip!.python/copyreg.py
               9 import 'copyreg' # <_frozen_importlib_external.SourceFileLoader object at 0x100080affcf8>
             342 import 're' # <_frozen_importlib_external.SourceFileLoader object at 0x100080b4d278>
            2838 import 'locale' # <_frozen_importlib_external.SourceFileLoader object at 0x100080b4cd30>
              61 # code object from zip!.python/os.py
             589 import 'errno' # <class '_frozen_importlib.BuiltinImporter'>
              51 # code object from zip!.python/stat.py
               3 import '_stat' # <class '_frozen_importlib.BuiltinImporter'>
            1644 import 'stat' # <_frozen_importlib_external.SourceFileLoader object at 0x100080c5c9b0>
             493 # code object from zip!.python/posixpath.py
               3 # code object from zip!.python/genericpath.py
               1 import 'genericpath' # <_frozen_importlib_external.SourceFileLoader object at 0x100080b17a58>
             208 import 'posixpath' # <_frozen_importlib_external.SourceFileLoader object at 0x100080b176d8>
              86 import 'os' # <_frozen_importlib_external.SourceFileLoader object at 0x100080d65160>
               3 Python 3.6.14+ (cosmopolitan) 
               0 [GCC 9.2.0] on cosmo
            1317 4
               3 # clear builtins._
               0 # clear sys.path
               0 # clear sys.argv
               0 # clear sys.ps1
               0 # clear sys.ps2
               0 # clear sys.last_type
               0 # clear sys.last_value
               0 # clear sys.last_traceback
               0 # clear sys.path_hooks
               0 # clear sys.path_importer_cache
               0 # clear sys.meta_path
               0 # clear sys.__interactivehook__
               0 # clear sys.flags
               0 # clear sys.float_info
               0 # restore sys.stdin
               0 # restore sys.stdout
               0 # restore sys.stderr
               0 # cleanup[2] removing builtins
               0 # cleanup[2] removing sys
               0 # cleanup[2] removing _frozen_importlib
               0 # cleanup[2] removing _imp
               0 # cleanup[2] removing _warnings
               0 # cleanup[2] removing _weakref
               0 # cleanup[2] removing _frozen_importlib_external
               0 # cleanup[2] removing _io
               0 # cleanup[2] removing marshal
               0 # cleanup[2] removing posix
               2 # cleanup[2] removing encodings
               0 # cleanup[2] removing codecs
               0 # cleanup[2] removing _codecs
               0 # cleanup[2] removing encodings.aliases
               0 # cleanup[2] removing encodings.utf_8
               0 # cleanup[2] removing _signal
               0 # cleanup[2] removing __main__
               0 # destroy __main__
               0 # cleanup[2] removing encodings.latin_1
               0 # cleanup[2] removing io
               0 # destroy io
               0 # cleanup[2] removing abc
               0 # cleanup[2] removing _weakrefset
               0 # destroy _weakrefset
               0 # cleanup[2] removing _bootlocale
               0 # destroy _bootlocale
               0 # cleanup[2] removing _locale
               0 # cleanup[2] removing locale
               0 # destroy locale
               0 # cleanup[2] removing re
               0 # cleanup[2] removing enum
               0 # cleanup[2] removing types
               0 # destroy types
               0 # cleanup[2] removing functools
               0 # cleanup[2] removing _functools
               0 # cleanup[2] removing collections
               0 # cleanup[2] removing _collections_abc
               0 # cleanup[2] removing operator
               0 # destroy operator
               0 # cleanup[2] removing _operator
               0 # cleanup[2] removing keyword
               0 # destroy keyword
               0 # cleanup[2] removing heapq
               0 # cleanup[2] removing _heapq
               0 # cleanup[2] removing itertools
               0 # cleanup[2] removing reprlib
               0 # destroy reprlib
               0 # cleanup[2] removing _dummy_thread
               0 # destroy _dummy_thread
               0 # cleanup[2] removing _collections
               0 # cleanup[2] removing weakref
               0 # destroy weakref
               0 # cleanup[2] removing collections.abc
               0 # cleanup[2] removing sre_compile
               0 # cleanup[2] removing _sre
               0 # cleanup[2] removing sre_parse
               0 # cleanup[2] removing sre_constants
               0 # destroy sre_constants
               0 # cleanup[2] removing copyreg
               0 # cleanup[2] removing os
               0 # cleanup[2] removing errno
               0 # cleanup[2] removing stat
               0 # cleanup[2] removing _stat
               0 # cleanup[2] removing posixpath
               0 # cleanup[2] removing genericpath
               0 # cleanup[2] removing os.path
               0 # destroy _signal
               0 # destroy encodings
               0 # destroy re
               0 # destroy enum
               0 # destroy sre_compile
               0 # destroy _locale
               0 # destroy copyreg
               0 # destroy functools
               0 # destroy _stat
               0 # destroy genericpath
               0 # destroy os
               0 # destroy _functools
               0 # destroy _collections_abc
               0 # destroy heapq
               0 # destroy collections.abc
               0 # destroy _operator
               0 # destroy _heapq
               0 # destroy _collections
               0 # destroy collections
               0 # destroy itertools
               0 # destroy sre_parse
               0 # destroy _sre
               0 # destroy abc
               0 # destroy errno
               0 # destroy stat
               0 # destroy posixpath
               0 # cleanup[3] wiping _frozen_importlib
               0 # destroy _frozen_importlib_external
               0 # cleanup[3] wiping _imp
               0 # cleanup[3] wiping _warnings
               0 # cleanup[3] wiping _weakref
               0 # cleanup[3] wiping _io
               0 # cleanup[3] wiping marshal
               0 # cleanup[3] wiping posix
               0 # cleanup[3] wiping codecs
               0 # cleanup[3] wiping _codecs
               0 # cleanup[3] wiping encodings.aliases
               0 # cleanup[3] wiping encodings.utf_8
               0 # cleanup[3] wiping encodings.latin_1
               1 # cleanup[3] wiping sys
             374 # cleanup[3] wiping builtins
jart commented 3 years ago

due to how _frozen_importlib works

Could you help me understand what that is? I've noticed a few C sources that seem to have Python binary byte code embedded in them. Do you know the commands for regenerating that?

jart commented 3 years ago

So I got the freeze program working in d522a88d. I wish I had a dollar for every time I've seen Python crash with "Unable to get the locale encoding". I built what I hope is a Python compiler in 0c6581f9. It seemed to work fine for loading modules until I used it for early stage module loading, which seems to follow different rules. It looks like the source code for the encodings module needs to be embedded uncompiled?

ahgamut commented 3 years ago

Could you help me understand what that is? I've noticed a few C sources that seem to have Python binary byte code embedded in them. Do you know the commands for regenerating that?

Python handles the loading of modules using the code in Lib/importlib/_bootstrap.py and Lib/importlib/_bootstrap_external.py. But there's a circular problem here because these files which handle the imports cannot be imported the way they import other modules.

The solution is via _frozen_importlib: the above two .py files are compiled into bytecode and "frozen" into raw bytes in importlib.inc. So _frozen_importlib is the first module that the interpreter loads, and it is used for loading along other modules during startup.

The Python Makefile contains a target called regen-importlib (see here), wherein if you change the _bootstrap.py files, they will be converted into the raw bytes seen in importlib.inc.

ahgamut commented 3 years ago

@jart Regarding the verbose logs seen above with deltaify.com, some of the time measurements for imports are misleading, because the delay is actually caused before the import happens. Example below.

As a C extension, the _signal module takes very little time to load, as expected

221 import '_signal' # <class '_frozen_importlib.BuiltinImporter'>

But the _functools modules, which is also a C extension takes a lot more time:

50 # code object from zip!.python/functools.py
4251 import '_functools' # <class '_frozen_importlib.BuiltinImporter'>

The time is not spent in the actual import of _functools; the time is spent in loading, parsing, and executing the code in functools.py which then calls import _functools. If _functools is imported by itself before functools.py is executed, the import takes much less time.

You can confirm this by loading all the C extensions early, before any Python modules are even considered:

https://github.com/jart/cosmopolitan/blob/3d0347e26efc05ad98ba70e9ce3963077458e4c6/third_party/python/Python/pylifecycle.c#L304-L310

If I call PyImport_ImportModule("_functools") (or any other C extension like _sre or _signal) right after _PyImport_Zip_Init() completes, for me the logs show that _functools by itself takes minimal time.

The slowness is only when it comes handling the Python side of things (so much indirection!).

ahgamut commented 3 years ago

During startup, the interpreter loads extension modules in the following manner (can view by setting gdb breakpoints in debug build and examining frames):

The options for speedup are:

ahgamut commented 3 years ago

So I got the freeze program working in d522a88. I wish I had a dollar for every time I've seen Python crash with "Unable to get the locale encoding".

I would also have gotten a decent amount of money :laughing: Alas, Python2.7 did not suffer from this "locale encoding" affliction, it just used ASCII everywhere and only relied on importing .py files at the end of the startup process.

I built what I hope is a Python compiler in 0c6581f.

By compiler do you mean "compiles to .pyc" or "compiles to raw bytes and use in C like frozen_importlib"? A Python compiler in C sounds awesome, let me try it out.

Is pycomp.com faster than running python.com -m compileall? I thought we could just have a minimal python.com with compileall in tool/build/ along with zipobj.com, as updating that would be easier.

It seemed to work fine for loading modules until I used it for early stage module loading, which seems to follow different rules. It looks like the source code for the encodings module needs to be embedded uncompiled?

I've explained what I've gleaned about the import process above. I'm pretty sure you can import encodings from a encodings/__init__.pyc file without issue; in the cosmo_py36 fork I used a filter to ensure only .pyc were added to python.com, and it had no issue finding encodings.pyc.

jart commented 3 years ago

See cabb0a7e. It's now officially a working 1.1mb pycomp.com binary which turns .py files into .pyc files. I found an objdump tool for pyc files that let me confirm that it 100% works. What I want is to compile the pyc files using the makefile, have the makefile then put the pyc files inside the zip, and then instruct python to load those instead.

The problem is that Python loads the .pyc. Then it silently ignores it and reads the .py afterwards anyway. Could you help me find out why?

I feel really uncomfortable with the way Python crisscrosses C and Python in the bootstrap process. The only module loading process I understand so far is the static linking one it uses for C. Look at how elegant the compiler is:

https://github.com/jart/cosmopolitan/blob/cabb0a7ede6b0855b0ccbd59545b7b255c6cea8c/third_party/python/pycomp.c#L43-L189

ahgamut commented 3 years ago

@jart with #248 the APE startup time has been reduced to 0.044s.

I feel really uncomfortable with the way Python crisscrosses C and Python in the bootstrap process.

A simple function call goes through 5 different files! All that indirection makes it unnaturally difficult to follow program flow. That's the price to pay for modifying Python I guess.

We can reduce APE size a bit further by excluding docstrings: just have to put a #ifdef around WITH_DOC_STRINGS in pyconfig.h, and have pycomp use optimize=2.

With MODE=tiny, only .pyc files in the ZIP store, and removing all docstrings python.com is at 6MB. With MODE=tiny, only .pyc files in the ZIP store, but keeping all docstrings python.com is at 6.8MB.

ahgamut commented 3 years ago

The only module loading process I understand so far is the static linking one it uses for C.

@jart I got an idea of how modules in general are loaded, and this is one way to avoid criss-crossing the C-Python border when importing Python files. The basic idea is:

find a way to import a module without relying on Python machinery (if I just fopen a file in the ZIP store, is it possible to somehow convert that into loading the module?)

Consider a file hello.py (or it's equivalent hello.pyc). Importing hello.py is slower than a C extension as it involves criss-crossing between C and Python + filesystem calls. Importing hello.pyc would be slightly faster, but there is still a lot of criss-crossing to slow things down.

import hello first checks for a key "hello" in sys.modules to see if the module already exists. So, if we can somehow add hello as a module to sys.modules before the import statement is executed, it would save time.

Example in Python, but I'm pretty sure it can be written in C using the CPython API (and avoid indirection when possible for eg. with fopen), because it uses only the most basic modules.

import sys
import marshal
import types

def preload_module_from_file(name, file):
    mod = types.ModuleType(name)  # Can use PyModule_CreateObject in the C API
    mod.__dict__["__name__"] = name
    mod.__dict__["__file__"] = file

    if name in sys.modules.keys():
        return
    sys.modules[name] = mod
    if file.endswith(".pyc"):
        with open(file, "rb") as f:
            raw = f.read()
        # can validate magic number+timestamp+size here
        # or just skip the first 12 bytes
        code = marshal.loads(raw[12:])
        exec(code, mod.__dict__)
    else:
        with open(file, "r") as f:
            exec(f.read(), mod.__dict__)

preload_module_from_file("hello", "hello.pyc") # faster when done in C w/ APE ZIP store
import hello # faster because preloaded

Similarly, if we call preload_module_from_file("encodings", "zip!./python/Lib/encodings/__init__.pyc") in C, before the actual import happens, it should be faster than just calling the import outright. (note that all the dependencies of encodings need to already be preloaded, otherwise there is no speedup).

If the C version is fast enough, module startup does not needs to involve crossing to the Python side as much, and there will be a net speed gain. This also opens the possibility of a macro PRELOAD_PYIMPORT(modname, filename) to be used in the C code.

ahgamut commented 3 years ago

@jart I came across Stackless Python 3.6.13, might be useful to port the stackless module to Cosmopolitan (the microthreads/coroutines don't seem to rely on pthreads).

Keithcat1 commented 3 years ago

After the last update, I'm having problems building Python. Here is what may or may not be a relevant block of text (there's a lot of it). ... build/bootstrap/ar.com rcsD o//third_party/python/python-stdlib-dirs.a o//third_party/python/Lib/.zip.o o//third_party/python/Lib/asyncio/.zip.o o//third_party/python/Lib/collections/.zip.o o//third_party/python/Lib/dbm/.zip.o o//third_party/python/Lib/distutils/.zip.o o//third_party/python/Lib/distutils/command/.zip.o o//third_party/python/Lib/distutils/tests/.zip.o o//third_party/python/Lib/email/.zip.o o//third_party/python/Lib/email/mime/.zip.o o//third_party/python/Lib/encodings/.zip.o o//third_party/python/Lib/ensurepip/.zip.o o//third_party/python/Lib/ensurepip/_bundled/.zip.o o//third_party/python/Lib/html/.zip.o o//third_party/python/Lib/http/.zip.o o//third_party/python/Lib/importlib/.zip.o o//third_party/python/Lib/json/.zip.o o//third_party/python/Lib/logging/.zip.o o//third_party/python/Lib/msilib/.zip.o o//third_party/python/Lib/multiprocessing/.zip.o o//third_party/python/Lib/multiprocessing/dummy/.zip.o o//third_party/python/Lib/sqlite3/.zip.o o//third_party/python/Lib/unittest/.zip.o o//third_party/python/Lib/urllib/.zip.o o//third_party/python/Lib/venv/.zip.o o//third_party/python/Lib/venv/scripts/common/.zip.o o//third_party/python/Lib/venv/scripts/nt/.zip.o o//third_party/python/Lib/venv/scripts/posix/.zip.o o//third_party/python/Lib/wsgiref/.zip.o o//third_party/python/Lib/xml/.zip.o o//third_party/python/Lib/xml/dom/.zip.o o//third_party/python/Lib/xml/etree/.zip.o o//third_party/python/Lib/xml/parsers/.zip.o o//third_party/python/Lib/xml/sax/.zip.o o//third_party/python/Lib/xmlrpc/.zip.o o//third_party/python/Lib/test/.zip.o o//third_party/python/Lib/test/xmltestdata/.zip.o o//third_party/python/Lib/test/test_email/.zip.o o//third_party/python/Lib/test/test_email/data/.zip.o o//third_party/python/Lib/test/sndhdrdata/.zip.o o//third_party/python/Lib/test/test_asyncio/.zip.o o//third_party/python/Lib/test/pycache/.zip.o o//third_party/python/Lib/test/audiodata/.zip.o o//third_party/python/Lib/test/imghdrdata/.zip.o o//third_party/python/Lib/test/decimaltestdata/.zip.o o//third_party/python/Lib/test/test_import/.zip.o o//third_party/python/Lib/test/test_import/data/.zip.o o//third_party/python/Lib/test/test_import/data/package/.zip.o o//third_party/python/Lib/test/test_import/data/package2/.zip.o o//third_party/python/Lib/test/test_import/data/circular_imports/.zip.o o//third_party/python/Lib/test/test_import/data/circular_imports/subpkg/.zip.o o//third_party/python/Lib/test/libregrtest/.zip.o o//third_party/python/Lib/test/libregrtest/pycache/.zip.o o//third_party/python/Lib/test/leakers/.zip.o o//third_party/python/Lib/test/test_json/.zip.o o//third_party/python/Lib/test/eintrdata/.zip.o o//third_party/python/Lib/test/support/.zip.o o//third_party/python/Lib/test/support/pycache/.zip.o o//third_party/python/Lib/test/test_importlib/.zip.o o//third_party/python/Lib/test/test_importlib/extension/.zip.o o//third_party/python/Lib/test/test_importlib/frozen/.zip.o o//third_party/python/Lib/test/testimportlib/import/.zip.o o//third_party/python/Lib/test/test_importlib/builtin/.zip.o o//third_party/python/Lib/test/test_importlib/source/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/project2/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/project2/parent/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/project2/parent/child/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/portion2/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/portion2/foo/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/project3/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/project3/parent/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/project3/parent/child/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/portion1/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/portion1/foo/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/both_portions/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/both_portions/foo/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/project1/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/project1/parent/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/project1/parent/child/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/not_a_namespace_pkg/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/not_a_namespace_pkg/foo/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/module_and_namespace_package/.zip.o o//third_party/python/Lib/test/test_importlib/namespace_pkgs/module_and_namespace_package/a_test/.zip.o o//third_party/python/Lib/test/test_warnings/.zip.o o//third_party/python/Lib/test/test_warnings/data/.zip.o o//third_party/python/Lib/test/capath/.zip.o o//third_party/python/Lib/test/dtracedata/.zip.o o//third_party/python/Lib/test/subprocessdata/.zip.o o//third_party/python/Lib/test/crashers/.zip.o o//third_party/python/Lib/test/cjkencodings/.zip.o o//third_party/python/Lib/test/test_tools/.zip.o o//third_party/python/Lib/test/tracedmodules/.zip.o

error:tool/build/ar.c:173:ar.com.tmp.10960: check failed on keith-pc pid 10961 CHECK_NE(-1, (fd = open(args.p[i], O_RDONLY))); → 0xffffffffffffffff (-1) != 0xffffffffffffffff ((fd = open(args.p[i], O_RDONLY))) o//third_party/python/Lib/test/pycache/.zip.o ENOENT[2] o/build/bootstrap/ar.com.tmp.10960 \ rcsD \ o//third_party/python/python-stdlib-dirs.a \ ... I'm also having problems with make. Quite often, the make process it just randomly stops instead of completing the build, even though there doesn't seem to be an error. Could we move to CMake and Ninja? One advantage is that building targets is easy, it would probably look like this:

cmake -G Ninja -DCMAKE_BUILD_TYPE=release .. ninja redbean python examples But oh, wouldn't this break the vendored GCC? It might also make compiling on Windows easier / harder?

ahgamut commented 3 years ago

CHECK_NE(-1, (fd = open(args.p[i], O_RDONLY))); → 0xffffffffffffffff (-1) != 0xffffffffffffffff ((fd = open(args.p[i], O_RDONLY))) o//third_party/python/Lib/test/pycache/.zip.o ENOENT[2] �[35mo/build/bootstrap/ar.com.tmp.10960 \�[0m rcsD o//third_party/python/python-stdlib-dirs.a

@Keithcat1 this error is due to a couple of unnecessary lines in python.mk. It goes away once #254 is merged.

I'm also having problems with make. Quite often, the make process it just randomly stops instead of completing the build, even though there doesn't seem to be an error.

@jart this happens when I make -j4 as well. The error is usually with the zoneinfo objects. My guess is it started with e963d9c8e3cb3d29e924c344f41975b63a62fef5, but I could be wrong.