jart / cosmopolitan

build-once run-anywhere c library
ISC License
17.83k stars 610 forks source link

Compiling Python #141

Closed ahgamut closed 12 months ago

ahgamut commented 3 years ago

https://github.com/ahgamut/python27
https://github.com/ahgamut/cpython/tree/cosmo_py27

The assert macro needs to be changed in cosmopolitan.h to enable compilation (see #138). Afterwards, just clone the repo and run superconfigure.

Python 2.7.18 compiled seamlessly once I figured out how autoconf worked, and what flags were being fed to the source files when running make. I'm pretty sure we can compile any C-based extensions into python.exe -- they just need to compiled/linked with Cosmopolitan, with necessary glue code added to the Python source. For example, I was able to compile SQLite into python.exe to enable the internal _sqlite module.

The compiled APE is about 4.1MB with MODE=tiny (without any of the standard modules, the interpreter alone is around 1.6MB). Most of the modules in the stdlib compile without error. The _socketmodule (required for Python's simple HTTP server) doesn't compile, as it requires the structs from netdb.h.

On Windows, the APE exits immediately because the intertpreter is unable to find the platform-specific files. Module/getpath.c and Lib/site.py in the Python source try to use absolute paths from the prefixes provided during compilation; Editing those files to search the right locations (possibly with some zipos magic) ought to fix this.

jart commented 3 years ago

@Keithcat1 The issues you encountered should be addressed by recent commits. I've made numerous other fixes and improvements too! For example, we now have really excellent completion. Here's a screencast demo.

actually-portable-python2

jart commented 3 years ago

@ahgamut I want to credit you somehow when we move forward with publicizing this contribution. One suggestion I have is the following language. Does it meet your approval?

Python 3.6.14+ (Actually Portable Python)
[GCC 9.2.0] on cosmo
Type "help", "copyright", "credits" or "license" for more information.
>>> credits()
    Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
    for supporting Python development.  See www.python.org for more information.
    Thanks go to github.com/ahgamut for porting Python to Cosmopolitan Libc.
ahgamut commented 3 years ago
Python 3.6.14+ (Actually Portable Python)
[GCC 9.2.0] on cosmo
Type "help", "copyright", "credits" or "license" for more information.
>>> credits()
    Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
    for supporting Python development.  See www.python.org for more information.
    Thanks go to github.com/ahgamut for porting Python to Cosmopolitan Libc.

This is wonderful @jart. Thank you!

Keithcat1 commented 3 years ago

It should be possible to change pip so that it downloads packages and then stores them inside the python.com zip file, right. I know that extension modules aren't supported, but pure Python packages should be. By the way why are we using Python 3.6 and not Python 3.9? I'm guessing the attempt to port Python to Cosmopolitan was started a long time ago and just now finished. I'm a little confused about stackless threads. Do they let you utilize your CPU more fully than if you were just running one normal Python thread?

jart commented 3 years ago

Yes pure Python packages will be supported, and they can be inside or outside the zip. Eventually native ones like Numpy will be supported too. @ahgamut did we remove that and need to add it back? To clarify, I want zip loading to be the first choice since it's the fast easy trademark nifty path. But if it's not in the zip, have it fall back to things like PYTHONPATH and current directory.

ahgamut commented 3 years ago

@Keithcat1

It should be possible to change pip so that it downloads packages

Using pip requires SSL, and Python relies on OpenSSL by default. OpenSSL has not yet been added to the Cosmopolitan repo. OpenSSL bloats the APE considerably. It would be wonderful if we could wrap MBedTLS and use it for Python's _ssl module`.

The current option you have is to download the packages through some other method. Check the README on my cosmo_py36 fork to see what you can for pip right now.

An additional wrinkle is some Python packages rely on _thread and some glibc-specific stuff, so even if the installation works, the packages themselves raise complaints later.

and then stores them inside the python.com zip file, right.

I don't think the Python APE can self-modify just yet; I got an ETXTBSY when I tried to pip install directly into the APE ZIP store. So the modifications to pip might take a while. (@jart pls confirm)

There are two options:

  1. keep the downloaded packages in ~/.local/lib/python3.6/site-packages or in the same directory as Python.
  2. Python wheels are just zip files, so you can unzip them and add the necessary folders into the APE (possibly add as .pyc files).

I know that extension modules aren't supported

If you add the source and the right recipe to python.mk, simple C extensions like greenlet or markupsafe ought to work. I know because they are part of my cosmo_py36 fork of APE Python and I used them to run Flask.

Stuff like numpy is much harder, but likely still possible. There is scope for some interesting tricks with libpython.a.

By the way why are we using Python 3.6 and not Python 3.9? I'm guessing the attempt to port Python to Cosmopolitan was started a long time ago and just now finished.

Python 3.7 and above require pthread by default, which Cosmopolitan does not currently support. So I picked Python3.6. I would like to use the newest version also, but I do not know of a way to port them yet.

I'm a little confused about stackless threads. Do they let you utilize your CPU more fully than if you were just running one normal Python thread?

The stackless module has not been added to the Cosmopolitan repo yet. I only came across it a few days back.

I'm also not sure about how the stackless module works. I think they are similar to coroutines in Go? I tried out this old benchmark that compares Go's microthreads to Stackless with Stackless Python 3.6.13, and Go 1.15.6, and Stackless Python is faster within a single process (GOMAXPROCS=1). (additional detail about the benchmark here)

Stackless Python is apparently used in the servers for a MMORPG, which is why I thought it might be interesting speed-wise to add to the Python APE. Has anyone written a web framework with the stackless module?

ahgamut commented 3 years ago

did we remove that and need to add it back? To clarify, I want zip loading to be the first choice since it's the fast easy trademark nifty path. But if it's not in the zip, have it fall back to things like PYTHONPATH and current directory.

@jart Here are the entries in sys.path for the APE:

2021-08-25_05-04-24_766x245

  1. is the current directory ''. This is added after startup, is default behavior for Python, and is for importing files in the current dir (like for simple scripts: if I had foo.py and bar.py in the current dir, and bar.py had import foo, Python would use this entry and select foo.py from nearby). You can put this after the ZIP store (by changing something in site.py) if you want. but it would likely interfere with how people expect Python to behave.
  2. is the APE zip store. I think this is right is where it should be, it doesn't interfere with '', and it is the fast path import for any script that isn't in the current dir of the interpreter.
  3. is the user's directory for Python packages. This is added by site.py if the directory exists, and can be disabled via python.com -s. Packages can be installed here by pip install --user <my_package>. I have used this to run some simple packages like Flask.

I don't think we need to add any more directories.

If a particular location is really necessary, we can add it to limited_search_path in Modules/getpath.c, or allow the use of PYTHONPATH/PYTHONHOME. I prefer to avoid PYTHONPATH/PYTHONHOME because it confuses the APE sometimes when I have multiple Python versions available, especially in Windows (for eg. if I am in a conda Python 3.7 venv, the APE would try and import a 3.7 lib).

(BTW the autocomplete in the APE is beautiful, I got a happy surprise when the text appeared in gray for completion)

ahgamut commented 3 years ago

@jart some decisions regarding python.com:

SSL support would allow the usage of pip and thereby install custom packages to the user's local directories. There are three options:

  1. skip SSL for now, because ensurepip, pip, and quite a few packages rely on threading. Would be annoying if we allowed pip installs and then a bunch of packages had complaints about how the APE works (for e.g. Dash requires the glibc version during installation, why?!?!)
  2. use OpenSSL: would need a PR for third_party/openssl with OpenSSL1.1.1k (it worked with the amalgamation earlier). I would prefer to avoid this because OpenSSL increases the APE size considerably, and I have not gotten any of the OpenSSL tests to pass (pip install worked on Linux though).
  3. use MbedTLS: would need to write a version of Modules/_ssl.c and Modules/_hashlib.c that use MbedTLS bindings instead of OpenSSL. If this is possible, I think it is the best option, because APE size would not increase as much, and we can avoid adding OpenSSL to the repo. I tried to do this a while back, but I was unsuccessful.

Stackless Python adds a custom stackless module to the Python stdlib to provide lightweight coroutines.

Do you think the stackless module should be added to Cosmopolitan repo? We can try to make an existing web framework use stackless instead of threads to see speedup, or a possible bigger goal can be a new, stackless-based, Python web framework unique to Cosmopolitan.

jart commented 3 years ago

I'm glad you enjoyed the readline interface. Definite no on stackless. We have better ways of simulating threads. Can't Flask do its thing using fork multiprocessing, just like gunicorn used to do? You know, if the goal is performance...

image

What I'd be interested in learning is whatever trick Pyston used to get the 30% performance boost. We obviously wouldn't want to incorporate their JIT work due to what happened with Unladen Swallow, but I seem to recall them writing a blog post where they said Python had some esoteric thing for debugging that no one really uses but made code really slow, so they removed it with great success. I'd like to know what that is.

How many kilobytes did OpenSSL add? It looks like it's 3mb on Alpine. What could it be doing? Our MbedTLS impl is more pareto optimized but you're still realistically looking at 200-500kb. Rewriting _ssl.c to integrate it with Python would be painful but it could be done.

It's also worth mentioning that incorporating chibicc or gcc+blinkenlights isn't out of the question, since it'd be nice to let people on any platform build their own native extensions without needing to install a compiler.

ahgamut commented 3 years ago

How many kilobytes did OpenSSL add?

OpenSSL adds around 2.2mb (from my comment here) to python.com.

Rewriting _ssl.c to integrate it with Python would be painful but it could be done.

Decoupling Python from OpenSSL is a known problem: there was an attempt with PEP 543 but it got withdrawn.

I'm not sure about adding OpenSSL to the repo. OpenSSL also has a 3.0.0 beta out now, so might be worth it to wait and see how the new version improves things.

Can't Flask do its thing using fork multiprocessing, just like gunicorn used to do? You know, if the goal is performance...

Flask relies on the threading module in imports, and also depends on greenlet (via Werkzeug IIRC). I haven't looked through the web framework options on Python to see if there's any library doesn't rely on threads. When I made this example GIF, I just did the minimum possible to get it to work. Notably, I did not try it in a production environment.

(python.com has been upgraded since I tried the Flask example, maybe I should try it again and see if there are any changes.)

ahgamut commented 3 years ago

The debugging changes made by Pyston are outlined here: https://github.com/pyston/pyston/wiki/Semantic-changes

Pyston is on Python 3.8 though, so I'm not sure if all the changes can be followed.

jart commented 3 years ago

There's nothing stopping us from checking-in the code for commonly used native extensions, such as greenlet, and shipping it as part of python.com.

ahgamut commented 3 years ago

Well, commonly used depends on what workflows/packages are the (first) target with python.com.

I only looked up greenlet as a specific package when the Flask installation failed. If I had tried a different framework, perhaps I wouldn't consider greenlet that important.

numpy-dependent stuff is common for me, but Python is "general purpose": web frameworks, computational stuff, midsize scripts that are unwieldy in bash etc.

What would you like to target first? I'm not sure targeting a Python webserver+framework is worthwhile especially since redbean exists (unless redbean+python, but that would definitely reduce speed).

The ideal would be to have 95%+ of the stdlib in python.com, add a few "highly dependent" third_party extensions (need some way to figure these out) and allow pip + compilation from source (via chibicc?!) for any additional packages.

jart commented 3 years ago

By the way, one question I forgot to answer earlier is that, it is possible to modify the zip store while the APE binary is running, but you need to call OpenExecutable which is a cool hack that's currently only being used by redbean. It currently only works on a couple operating systems and we're working to polyfill it across platforms with a slow copy fallback path. But generally speaking the zip executable, best expectation is to think of it as fast read only access for now.

What would you like to target first?

I want to get as many tests passing as possible first and incorporate them into the build. The Python unit tests are laying bare all the subtle shortcomings in the libc implementation which is great. Then I want to create an HTML page with green / yellow / red boxes in a table that gives people a birds eye overview of how complete Actually Portable Python has progressed. Once we have that we can launch, since it means happy users with realistic expectations and a call to action for anyone interested in helping.

Keithcat1 commented 3 years ago

@ahgamut When you say that extension modules are supported, do you mean that they have to be statically linked or that they can be loaded from disk? @jart The time.sleep function doesn't work, it raises an exception.

The tempfile module is broken, at least on Windows.

I wasn't able to build Python.com (make mode=dbg) with asan. The binary wouldn't run when started, just immediately exiting. I did get this error once though. error: could not map asan shadow memory

Can you fix the thing where make will sometimes stop?

I tried building Python.com with Clang and flto to make it faster / smaller: export CFLAGS=-flto export CXXFLAGS=-flto make -j8 MODE=llvm -O o/llvm/third_party/python But the build either kept stopping at some point fo r no reason or giving an error stating that one of the files in the build got too big, I don't have the exact error message.

Would it be possible to use PyPy instead of regular Cpython? I'm not really expecting you to do this but am curious if it would work. Does Cosmopolitan even support JIT code?

Thanks for all your hard work on this project, it's very exciting and I look forward to new stuff!

ahgamut commented 3 years ago

@Keithcat1

Extension modules are possible, but require a lot of effort right now. Extensions have to be linked statically, and you would need to make some changes in the source code of the extension in order to compile + link it along with libpython.a.

If you would like to experiment with how extensions are added to python.com, I would suggest using the my cosmo_py36 fork of CPython which contains all the necessary changes to add extensions statically via python's regular build process.

The repo is at: https://github.com/ahgamut/cpython/tree/cosmo_py36 The outline of the process to add extensions is at: https://github.com/ahgamut/cpython/tree/cosmo_py36#what-about-c-extensions An example with the greenlet module and its _greenlet extension is in this commit: https://github.com/ahgamut/cpython/commit/0274a49934d4f8aa1fc98c10ebf15d694608e4c2

jart commented 3 years ago

Extension modules are possible, but require a lot of effort right now.

I wish you wouldn't think of it that way. The people who maintain each platform put in a lot of work to create a prescripted path that's specific to their platform for installing python modules. For example, on Debian if you want Numpy you just apt install python-numpy. The tradeoff is that if you take the easy path like that, your program will only run on Debian. We've created a meta-platform of our own, that lets people build Python apps that not only run on every Linux distro, but all the other operating systems too. But since it's new, someone needs to do all that trailblazing which will make things easy for people in the future. We're the ones doing that.

ahgamut commented 3 years ago

@jart you are right, perhaps calling it "lot of effort" makes it seem unnecessarily daunting. If someone wanted Actually Portable CPython extensions (for e.g. _sqlite, greenlet, or markupsafe), for a few lines' worth of code, Cosmopolitan makes that possible today, which is amazing.

I think that customizing setuptools (via chibicc, dlopen, or converting setup.py to a Makefile recipe) can unlock a more hands-off approach, wherein we would not need to edit the source code at all. Combine that with a polyfilled OpenExecutable, and many more exciting things will happen.

Keithcat1 commented 3 years ago

@jart Would this help for adding shared object support? https://github.com/fancycode/MemoryModule I'm guessing loading shared objects or DLLs using the OS default functions isn't the hard part. Py2EXE actually had functionality that would bundle everything up in one zip file and load extension modules directly using the above C library.

pkulchenko commented 3 years ago

@Keithcat1, I hope so. I already brought MemoryModule up in related discussions in https://github.com/jart/cosmopolitan/issues/97#issuecomment-874415031 and #137. I think the challenge is to not use the OS default functions if we want to be able to load APE-based extension modules.

jart commented 3 years ago

The challenge isn't implementing the code loading. The challenge is what gets bundled, and what is loadable. cosmopolitan.a is 32mb with debug symbols stripped. It defines 4,000 functions.

So here's what we'd have to do first in order to move forward with a project like dynamic linking. We need an ABI requirements doc. We can study the most popular native extensions. Like numpy. Run readelf or nm to get the undef symbols they need, and make a list.

jart commented 3 years ago

OK so we now have a pretty nice static analyzer for Python sources, which turns them into ELF objects. https://github.com/jart/cosmopolitan/blob/master/third_party/python/pyobj.c Stuff like this:

Symbol table '.symtab' contains 26 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    8
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT    9
    10: 0000000000000000     0 SECTION LOCAL  DEFAULT   10
    11: 0000000000000000     0 SECTION LOCAL  DEFAULT   11
    12: 0000000000000000    52 OBJECT  LOCAL  DEFAULT    4 zip+lfile:.python/antigravity.py
    13: 0000000000000000   294 OBJECT  LOCAL  DEFAULT    5 zip+cdir:.python/antigravity.py
    14: 0000000000000000    53 OBJECT  LOCAL  DEFAULT    7 zip+lfile:.python/antigravity.pyc
    15: 0000000000000000   294 OBJECT  LOCAL  DEFAULT    8 zip+cdir:.python/antigravity.pyc
    16: 0000000000000034   306 OBJECT  GLOBAL DEFAULT    4 py:antigravity
    17: 0000000000000035   547 OBJECT  GLOBAL DEFAULT    7 pyc:antigravity
    18: 0000000000000000     0 OBJECT  GLOBAL HIDDEN   UND pyc:webbrowser
    19: 0000000000000000     0 OBJECT  GLOBAL HIDDEN   UND pyc:hashlib
    20: 0000000000000000     0 OBJECT  GLOBAL DEFAULT   10 pyc:antigravity.__file__
    21: 0000000000000000     0 OBJECT  GLOBAL DEFAULT   10 pyc:antigravity.webbrowser
    22: 0000000000000000     0 OBJECT  GLOBAL DEFAULT   10 pyc:antigravity.hashlib
    23: 0000000000000000     0 OBJECT  GLOBAL DEFAULT   10 pyc:antigravity.geohash
    24: 0000000000000000     0 OBJECT  GLOBAL HIDDEN   UND .python/
    25: 0000000000000000     0 OBJECT  GLOBAL HIDDEN   UND __zip_start

It works well enough that I'm learning some depressing things about the Python standard library. For example, the help() function depends on http.server. Right now the way the static analyzer is set up is we can make weird dependencies like that "weak" in elf terms by putting a try statement around them. That way we can make reasonably small binaries.

jart commented 3 years ago

Another bit of good news is that I've migrated _hashlib to MbedTLS!

Keithcat1 commented 3 years ago

It seems that connecting to for example localhost:6060 using urllib.requests.urlretrieve doesn't work. urllib.error.URLError: <urlopen error [Errno 10051] ENETUNREACH[10051]>

jart commented 3 years ago

There appears to be a regression the re module too. These issues will all be resolved as we get the unit tests integrated in the build.

Keithcat1 commented 3 years ago

You seem to have broken Python on Windows but not on WSL: Fatal Python error: Py_Initialize: can't initialize sys standard streams

Current thread 0x0000000000000000 (most recent call first): 7770000ffde0 00000069abda __die+0x3e 7770000ffdf0 000000683413 Py_FatalError+0x66 7770000ffe10 000000684088 _Py_InitializeEx_Private+0x414 7770000ffe50 0000006840f7 Py_InitializeEx+0x13 7770000ffe60 00000068410c Py_Initialize+0x13 7770000ffe70 0000004170b7 Py_Main+0x667 7770000fff80 00000040bd79 main+0x1bb 7770000fffe0 000000401890 cosmo+0x40 7770000ffff0 0000006b1f0a _jmpstack+0x17

Also, how does the size-reducing native-module-linkage thing you did work? I know that extension modules are compicated, but is it possible to get CTYPES working relatively easily?

ahgamut commented 3 years ago

Fatal Python error: Py_Initialize: can't initialize sys standard streams Current thread 0x0000000000000000 (most recent call first): 7770000ffde0 00000069abda __die+0x3e 7770000ffdf0 000000683413 Py_FatalError+0x66 7770000ffe10 000000684088 _Py_InitializeEx_Private+0x414 7770000ffe50 0000006840f7 Py_InitializeEx+0x13 7770000ffe60 00000068410c Py_Initialize+0x13 7770000ffe70 0000004170b7 Py_Main+0x667 7770000fff80 00000040bd79 main+0x1bb 7770000fffe0 000000401890 cosmo+0x40 7770000ffff0 0000006b1f0a _jmpstack+0x17

IIRC, this error is a delayed form of the classic Unable to get the locale encoding. I've seen this when I was getting _codecs/encodings to load on Windows in the first place.

The error at that point was the necessary codec module was not getting imported correctly in get_locale_encoding during startup, but the interpreter figures out that something is awry and crashes only later, when setting up sys.stdin/sys.stdout. Let me see if I can get Windows running and test if this is similar.

jart commented 3 years ago

Thanks for the reports! That helps narrow things down. I should have a fix pushed shortly.

You asked earlier about how to use the new Python compiler. Right now it's able to be used via the makefile config. Earlier today I added a tutorial to the examples folder showing how an independent package can be configured to build a Python file into a 1.9mb static executable. Please take a look? https://github.com/jart/cosmopolitan/tree/master/examples/pyapp

https://github.com/jart/cosmopolitan/blob/51904e2687c04d7ae20410cd94c2148972d6bae6/examples/pyapp/pyapp.mk#L1-L116

As for ctypes, it'd be nice if Cosmopolitan supported dynamic linking, because it's frequently requested. However it's something I personally do not want for myself, so it's unlikely that I'll be implementing it myself. The focus has always been doing static linking better than anyone else. Cosmopolitan is sort of like FreeBSD/NetBSD/OpenBSD-style multitenant monorepo.

Any dependencies you need, we can add them to the third_party folder. It might even be possible using GitHub's .gitowners feature for me to simply authorize people to have their own folders within this repo that they can independently maintain. There's also the possibility of setting up a web GUI which could be the easier solution for folks who don't need the full power the make build provides.

jart commented 3 years ago

OK I think I've finally figured out the optimal strategy for informing the PYOBJ.COM static analyzer about implicit dependencies. Basically it looks like this:

if __name__ == 'PYOBJ.COM':
    AF_APPLETALK = 0
    AF_ASH = 0
    AF_ATMPVC = 0
    AF_ATMSVC = 0
    AF_AX25 = 0
    AF_BRIDGE = 0
    AF_CAN = 0
    # ....

That way if a module like socket is doing a weird import star, we can still have it declare its dependencies, while making minimal changes to the upstream sources.

ahgamut commented 3 years ago

OK I think I've finally figured out the optimal strategy for informing the PYOBJ.COM static analyzer about implicit dependencies. Basically it looks like this:

@jart the above strategy is for statements like from _module import *. Import star statements use the __all__ attribute of the module to decide which objects to show, or fall back to (?) showing everything.

I am unclear on this: given a from _module import * statement, does PYOBJ.COM declare a symbol for

I will read the source to find out, but an external explanation would also be useful.

Additional note: If we have to declare dependences with an if __name__ == 'PYOBJ'.com for import star statements, then we can also do the same for the importlib.import_module("module_str") or __import__("module_str") hacks which are seen in the Lib/test/test_something.py files.

ahgamut commented 3 years ago

Any dependencies you need, we can add them to the third_party folder. It might even be possible using GitHub's .gitowners feature for me to simply authorize people to have their own folders within this repo that they can independently maintain. There's also the possibility of setting up a web GUI which could be the easier solution for folks who don't need the full power the make build provides.

Adding a (pure-python) third_party package is as easy as adding the right folder to /zip/.python.

For example, I downloaded the plotext package separately, and manually added it to the APE ZIP store. This is pure-python package with zero dependencies external dependencies, and it works via the APE on Linux:

2021-09-12_19-47-48_1366x736

No changes to the plotext source were required! This is a good sign, it means @jart's efforts to ensure compatible-with-CPython-yet-better behavior are working. (Windows is still an issue, APE erroring with os.py expecting posix.F_LOCK)

I will try to find more such simple packages to test APE compatibility behavior.
I think there are a few more places where Cosmopolitan would need to show compatibility with glibc/CPython (for example Lib/platform.py:_sys_version() needs a conditional for Cosmopolitan to avoid potential complaints.)

jart commented 3 years ago

That's amazing @ahgamut. Also good news @Keithcat1! All the issues you brought to our attention should now be resolved by b5f743cdc384044299957bd94bb836eed2de26d4. I've confirmed that the HTTP client works on every single one of the operating systems we support. We now have 148 of Python's unit test programs incorporated into our build too, in order to ensure things stay fixed! That puts us at about 50% of the way there towards full automation.The tests run pretty fast too since they're fully parallelized:

image

Out of curiosity how long do the Python unit tests take when you run them the normal way? I seem to recall it taking a long time. Nice thing about small static binaries is that it makes the build / test latency 10x faster. It also means that all we need to do in order to test each program is write a script that does:

set -ex
for os in freebsd openbsd netbsd rhel7 rhel5 xnu win7 win10; do
  scp test_stat.com $os: &&
  ssh $os ./test_stat.com || exit
done

I like to think of it as the maximum consciousness approach to development since it makes life as a developer so much more pleasant for everyone involved over the long term, so we can have more impact focusing on the complexity that matters. So I'm sure Actually Portable Python will be a nice change of pace for anyone who feels like they're trapped in a tooling stack that feels like a messy bachelor's pad.

Anyway I think what we have is novel and definitely good enough for an alpha release. Please feel free to share any ideas on things like a logo, web design, or promotional strategies, since I can imagine a whole lot of people benefiting from this work.

ahgamut commented 3 years ago

I like to think of it as the maximum consciousness approach to development since it makes life as a developer so much more pleasant for everyone involved over the long term, so we can have more impact focusing on the complexity that matters. So I'm sure Actually Portable Python will be a nice change of pace for anyone who feels like they're trapped in a tooling stack that feels like a messy bachelor's pad.

Indeed, testing/debugging/updating APE Python from the cosmo build system is a breeze compared to what I was doing earlier with the regular Python repo.

Anyway I think what we have is novel and definitely good enough for an alpha release. Please feel free to share any ideas on things like a logo, web design, or promotional strategies, since I can imagine a whole lot of people benefiting from this work.

These are some points I noted down as "possible benefits of porting Python to Cosmopolitan Libc". Perhaps it can help you decide on strategies.

Keithcat1 commented 3 years ago

The reason I wanted CTYPES is because it would allow me to use closed-source libraries like http://www.un4seen.com/bass.html which is good for playing audio but can't be statically linked. It would also make it easier to deal with libraries like SDL2 that might not be compilable with Cosmopolitan for a while. Lastly, it would make it possible for APE Python to just download prebuilt wheels for Windows, Mac and Linux and then load the right DLLs depending on which platform the APE binary is running on.

ahgamut commented 2 years ago

@jart libc/complex.h declares functions for use with complex float/complex double, but I get a linker error when trying to use csqrt.

Are these functions implemented? If not, I'd like to try implementing some. Is there a reference I can use?

ahgamut commented 2 years ago

@jart bottle.py is a minimal web framework that may be useful if added to the repo. The package is a single file, and I confirmed that the initial example works if I comment out disable import _thread in the source file.

https://bottlepy.org/docs/0.12/

ahgamut commented 2 years ago

What I'd be interested in learning is whatever trick Pyston used to get the 30% performance boost. We obviously wouldn't want to incorporate their JIT work due to what happened with Unladen Swallow, but I seem to recall them writing a blog post where they said Python had some esoteric thing for debugging that no one really uses but made code really slow, so they removed it with great success. I'd like to know what that is.

@jart with #264 and #281 we've disabled all the debugging elements mentioned in the Pyston wiki:

The other areas when Pyston has speedups are either JIT or implementation-defined. Let me see if any of those have equivalents in Cosmopolitan.

Trying some benchmarks might be fun if we have something better than dummy_threading.

jart commented 2 years ago

Stack overflow checking is relatively straightforward with Cosmopolitan.

forceinline int Py_EnterRecursiveCall(const char *where) {
  const char *rsp, *bot;
  extern char ape_stack_vaddr[] __attribute__((__weak__));
  if (!IsTiny()) {
    rsp = __builtin_frame_address(0);
    asm(".weak\tape_stack_vaddr\n\t"
        "movabs\t$ape_stack_vaddr+12288,%0" : "=r"(bot));
    if (UNLIKELY(rsp < bot)) {
      PyErr_Format(PyExc_MemoryError, "Stack overflow%s", where);
      return -1;
    }
  }
  return 0;
}

I've added this in 7521bf9e73d340eb36d76f3938eb29e07278ee69.

Are there any simple JIT implementation defined things it's doing that yield big benefits?

jart commented 2 years ago

Complex math functions like csqrt() need to be ported from Musl. Contributions welcome. Bottle is welcome too.

The reason I wanted CTYPES is because it would allow me to use closed-source libraries like http://www.un4seen.com/bass.html which is good for playing audio but can't be statically linked. It would also make it easier to deal with libraries like SDL2 that might not be compilable with Cosmopolitan for a while. Lastly, it would make it possible for APE Python to just download prebuilt wheels for Windows, Mac and Linux and then load the right DLLs depending on which platform the APE binary is running on.

ctypes isn't able to function as a stopgap since it only works if you're integrating with the platform-specific userspace tooling. Native libraries need to be checked-in to the cosmo third_party directory and linked statically. Otherwise you can interop with external native code via subprocesses, which should serve as a reasonable stopgap.

What we should do is provide an alternative to the platform specific tooling, by integrating chibicc. That way, rather than dropping .so files into your python modules dir you would instead drop .c files, and the Python runtime will compile them and generate bindings automatically, and it would do so in a manner that behaves precisely the same across platforms.

ahgamut commented 2 years ago

Are there any simple JIT implementation defined things it's doing that yield big benefits?

They've mentioned something like "reducing comparisons for attribute lookups" but I haven't looked at those parts of the source just yet.

Complex math functions like csqrt() need to be ported from Musl. Contributions welcome. Bottle is welcome too.

Bottle is a single file. complex math from musl let me check.

ahgamut commented 2 years ago

Complex functions are nice to have; the reason I was asking about them is because I've gotten quite close in my attempts to get Numpy into APE Python. Everything compiles, but linker is missing symbols -- probably I've mixed up porting some recipes or I am adding unnecessary files into the Makefile.

jart commented 2 years ago

How much slower could its BLAS subroutines be? LAPACK is a good library, but I'll need to check-in a FORTRAN compiler. We used to have a FORTRAN standard library checked-in to third party. LAPACK is underwhelming compared to MKL which sadly we can't use since it doesn't respect the freedom to hack. From what I hear BLIS is the closest alternative that meets our licensing requirements. https://github.com/flame/blis/blob/master/docs/graphs/large/l3_perf_skx_nt1.pdf So we might want to cherry-pick some of its subroutines to accellerate things like matrix multiplication.

If Cython works by generating a .c file and handing it over to the gcc command then we should totally include it. I can update it to integrate with an embedded chibicc runtime once it's ready.

-DPy_BUILD_CORE

I don't like that define because it makes commands like make o//third_party/python/Python/errors.ss break. I wish it wasn't needed and I've been considering having it be a define at the top of each .c file that needs it, similar to PY_SSIZE_T_CLEAN.

ahgamut commented 2 years ago

LAPACK is a good library, but I'll need to check-in a FORTRAN compiler.

The vendored gcc executable in the repo has gfortran built-in? I just used the OBJECTIFY.f rules in build/rules.mk and it worked.

We used to have a FORTRAN standard library checked-in to third party.

I don't know what functions libgfortran provides, yesterday the LAPACK build needed string concatenation (there is a // b string concatenation operationin FORTRAN.

If Cython works by generating a .c file and handing it over to the gcc command then we should totally include it.

Cython just handles the Python -> C conversion (+ it has some additional syntax that help in optimizing said conversion). The generated C file can be used for anything.

ahgamut commented 2 years ago

@jart the documentation for BLIS says it requires pthreads, even with mulltithreading disabled. pthreads is currently not implemented in Cosmopolitan, so I'll look into this at a later time.

jart commented 2 years ago

Reading this maybe we should use OpenBLAS with DYNAMIC_ARCH=1 with the exception of MODE=opt since that uses -march=native. https://news.ycombinator.com/item?id=28736136

pkulchenko commented 2 years ago

Related recent discussion on OpenBLAS here.

jart commented 2 years ago

That discussion is a little handwavy. People who use Linux as a desktop will never see eye-to-eye with people who simply want sweet sweet unadulterated performance that Microsoft and Apple would never in a million years offer, since it makes GUIs slow. Anyway our luck is about to change. See https://github.com/jart/cosmopolitan/pull/282

Keithcat1 commented 2 years ago

Cython is a tool that allows creating Python extension modules by converting Pythonic code into a C file which can call between C and Python. Might it be worth allowing to import .pyx (Cython files) which internally converts them to C files and then compiles them? Since it would already be an extension module, no bindings need to be generated since Cython does that already. Also when you say using external subprocesses, do you mean compiling one executable file for each platform that my Cosmo app will run on and then send messages to and from it? It seems clunky, but thinking about it it could be tough to for example get integer sizes right on all platforms.

jart commented 2 years ago

No only the compiler would run as a subprocess. (At least until we can retool chibicc to not rely on _exit() to free() its memory). It would use the same ABI for all platforms. One compilation. Otherwise it's not build once run anywhere.

ahgamut commented 2 years ago

@jart one possible way to add many small speedups is to use METH_FASTCALL instead of METH_VARARGS when specifying the arguments for a CPython function (a Python function written in C). METH_FASTCALL prevents creation of an unnecessary Python tuple of args, and instead just uses an array of PyObject *. The tradeoff is that there is some boilerplate to write to use METH_FASTCALL properly.

METH_FASTCALL went through a lot of changes in Python 3.7 (git log -i --grep="FASTCALL" between Python v3.7.12 and v3.6.15), and is much faster + nicer to use.

METH_FASTCALL is still considered an internal detail of Python 3.7 -- I hope this means it can be added to APE Python without any major compatibility issues.

Example of manually adding METH_FASTCALL (both the below commits cannot be ported to the monorepo ATM, because Py_Arg_UnpackStack and a bunch of other internals need to be moved first):

An automated way of adding METH_FASTCALL in Python is to use the Argument Clinic generator and generate the necessary clinic/*.inc files.

I tried using Argument Clinic + METH_FASTCALL in 3.6 in https://github.com/ahgamut/cpython/commit/2b417a690735b8f004257eddc526ece66dd135b4 for a few methods. The related tests pass, but no noticeable speedup.