Closed ahgamut closed 12 months ago
@Keithcat1 The issues you encountered should be addressed by recent commits. I've made numerous other fixes and improvements too! For example, we now have really excellent completion. Here's a screencast demo.
@ahgamut I want to credit you somehow when we move forward with publicizing this contribution. One suggestion I have is the following language. Does it meet your approval?
Python 3.6.14+ (Actually Portable Python)
[GCC 9.2.0] on cosmo
Type "help", "copyright", "credits" or "license" for more information.
>>> credits()
Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
for supporting Python development. See www.python.org for more information.
Thanks go to github.com/ahgamut for porting Python to Cosmopolitan Libc.
Python 3.6.14+ (Actually Portable Python) [GCC 9.2.0] on cosmo Type "help", "copyright", "credits" or "license" for more information. >>> credits() Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands for supporting Python development. See www.python.org for more information. Thanks go to github.com/ahgamut for porting Python to Cosmopolitan Libc.
This is wonderful @jart. Thank you!
It should be possible to change pip so that it downloads packages and then stores them inside the python.com zip file, right. I know that extension modules aren't supported, but pure Python packages should be. By the way why are we using Python 3.6 and not Python 3.9? I'm guessing the attempt to port Python to Cosmopolitan was started a long time ago and just now finished. I'm a little confused about stackless threads. Do they let you utilize your CPU more fully than if you were just running one normal Python thread?
Yes pure Python packages will be supported, and they can be inside or outside the zip. Eventually native ones like Numpy will be supported too. @ahgamut did we remove that and need to add it back? To clarify, I want zip loading to be the first choice since it's the fast easy trademark nifty path. But if it's not in the zip, have it fall back to things like PYTHONPATH and current directory.
@Keithcat1
It should be possible to change pip so that it downloads packages
Using pip requires SSL, and Python relies on OpenSSL by default. OpenSSL has not yet been added to the Cosmopolitan repo.
OpenSSL bloats the APE considerably. It would be wonderful if we could wrap MBedTLS and use it for Python's _ssl
module`.
The current option you have is to download the packages through some other method. Check the README on my cosmo_py36
fork to see what you can for pip right now.
An additional wrinkle is some Python packages rely on _thread
and some glibc-specific stuff, so even if the installation works, the packages themselves raise complaints later.
and then stores them inside the python.com zip file, right.
I don't think the Python APE can self-modify just yet; I got an ETXTBSY
when I tried to pip install
directly into the APE ZIP store. So the modifications to pip
might take a while. (@jart pls confirm)
There are two options:
~/.local/lib/python3.6/site-packages
or in the same directory as Python..pyc
files).I know that extension modules aren't supported
If you add the source and the right recipe to python.mk
, simple C extensions like greenlet
or markupsafe
ought to work. I know because they are part of my cosmo_py36
fork of APE Python and I used them to run Flask.
Stuff like numpy
is much harder, but likely still possible. There is scope for some interesting tricks with libpython.a
.
By the way why are we using Python 3.6 and not Python 3.9? I'm guessing the attempt to port Python to Cosmopolitan was started a long time ago and just now finished.
Python 3.7 and above require pthread
by default, which Cosmopolitan does not currently support. So I picked Python3.6. I would like to use the newest version also, but I do not know of a way to port them yet.
I'm a little confused about stackless threads. Do they let you utilize your CPU more fully than if you were just running one normal Python thread?
The stackless
module has not been added to the Cosmopolitan repo yet. I only came across it a few days back.
I'm also not sure about how the stackless
module works. I think they are similar to coroutines in Go? I tried out this old benchmark that compares Go's microthreads to Stackless with Stackless Python 3.6.13, and Go 1.15.6, and Stackless Python is faster within a single process (GOMAXPROCS=1
). (additional detail about the benchmark here)
Stackless Python is apparently used in the servers for a MMORPG, which is why I thought it might be interesting speed-wise to add to the Python APE. Has anyone written a web framework with the stackless
module?
did we remove that and need to add it back? To clarify, I want zip loading to be the first choice since it's the fast easy trademark nifty path. But if it's not in the zip, have it fall back to things like PYTHONPATH and current directory.
@jart Here are the entries in sys.path
for the APE:
''
. This is added after startup, is default behavior for Python, and is for importing files in the current dir (like for simple scripts: if I had foo.py
and bar.py
in the current dir, and bar.py
had import foo
, Python would use this entry and select foo.py
from nearby). You can put this after the ZIP store (by changing something in site.py
) if you want. ''
, and it is the fast path import for any script that isn't in the current dir of the interpreter. site.py
if the directory exists, and can be disabled via python.com -s
. Packages can be installed here by pip install --user <my_package>
. I have used this to run some simple packages like Flask.I don't think we need to add any more directories.
If a particular location is really necessary, we can add it to limited_search_path
in Modules/getpath.c
, or allow the use of PYTHONPATH
/PYTHONHOME
. I prefer to avoid PYTHONPATH
/PYTHONHOME
because it confuses the APE sometimes when I have multiple Python versions available, especially in Windows (for eg. if I am in a conda
Python 3.7 venv, the APE would try and import a 3.7 lib).
(BTW the autocomplete in the APE is beautiful, I got a happy surprise when the text appeared in gray for completion)
@jart some decisions regarding python.com
:
SSL support would allow the usage of pip
and thereby install custom packages to the user's local directories.
There are three options:
ensurepip
, pip
, and quite a few packages rely on threading
. Would be annoying if we allowed pip
installs and then a bunch of packages had complaints about how the APE works (for e.g. Dash requires the glibc version during installation, why?!?!)third_party/openssl
with OpenSSL1.1.1k (it worked with the amalgamation earlier). I would prefer to avoid this because OpenSSL increases the APE size considerably, and I have not gotten any of the OpenSSL tests to pass (pip install
worked on Linux though).Modules/_ssl.c
and Modules/_hashlib.c
that use MbedTLS bindings instead of OpenSSL. If this is possible, I think it is the best option, because APE size would not increase as much, and we can avoid adding OpenSSL to the repo. I tried to do this a while back, but I was unsuccessful.Stackless Python adds a custom stackless
module to the Python stdlib to provide lightweight coroutines.
python.com
with stackless
beats Go within a single process on my system). The syntax looks weird though. Stackless Python claims to be the backbone of a MMORPG, so maybe there is something to the speed claims.stackless
without any issue: had to resolve a few merge conflicts between the stackless fork of Python 3.6 and my cosmo_py36
fork. You can see it here: https://github.com/ahgamut/cpython/tree/stackless-cosmoDo you think the stackless
module should be added to Cosmopolitan repo? We can try to make an existing web framework use stackless
instead of threads to see speedup, or a possible bigger goal can be a new, stackless
-based, Python web framework unique to Cosmopolitan.
I'm glad you enjoyed the readline interface. Definite no on stackless. We have better ways of simulating threads. Can't Flask do its thing using fork multiprocessing, just like gunicorn used to do? You know, if the goal is performance...
What I'd be interested in learning is whatever trick Pyston used to get the 30% performance boost. We obviously wouldn't want to incorporate their JIT work due to what happened with Unladen Swallow, but I seem to recall them writing a blog post where they said Python had some esoteric thing for debugging that no one really uses but made code really slow, so they removed it with great success. I'd like to know what that is.
How many kilobytes did OpenSSL add? It looks like it's 3mb on Alpine. What could it be doing? Our MbedTLS impl is more pareto optimized but you're still realistically looking at 200-500kb. Rewriting _ssl.c to integrate it with Python would be painful but it could be done.
It's also worth mentioning that incorporating chibicc or gcc+blinkenlights isn't out of the question, since it'd be nice to let people on any platform build their own native extensions without needing to install a compiler.
How many kilobytes did OpenSSL add?
OpenSSL adds around 2.2mb (from my comment here) to python.com
.
Rewriting _ssl.c to integrate it with Python would be painful but it could be done.
Decoupling Python from OpenSSL is a known problem: there was an attempt with PEP 543 but it got withdrawn.
I'm not sure about adding OpenSSL to the repo. OpenSSL also has a 3.0.0 beta out now, so might be worth it to wait and see how the new version improves things.
Can't Flask do its thing using fork multiprocessing, just like gunicorn used to do? You know, if the goal is performance...
Flask relies on the threading
module in imports, and also depends on greenlet
(via Werkzeug
IIRC). I haven't looked through the web framework options on Python to see if there's any library doesn't rely on threads. When I made this example GIF, I just did the minimum possible to get it to work. Notably, I did not try it in a production environment.
(python.com
has been upgraded since I tried the Flask example, maybe I should try it again and see if there are any changes.)
The debugging changes made by Pyston are outlined here: https://github.com/pyston/pyston/wiki/Semantic-changes
Pyston is on Python 3.8 though, so I'm not sure if all the changes can be followed.
There's nothing stopping us from checking-in the code for commonly used native extensions, such as greenlet, and shipping it as part of python.com.
Well, commonly used depends on what workflows/packages are the (first) target with python.com
.
I only looked up greenlet
as a specific package when the Flask installation failed. If I had tried a different framework, perhaps I wouldn't consider greenlet
that important.
numpy
-dependent stuff is common for me, but Python is "general purpose": web frameworks, computational stuff, midsize scripts that are unwieldy in bash etc.
What would you like to target first? I'm not sure targeting a Python webserver+framework is worthwhile especially since redbean exists (unless redbean+python, but that would definitely reduce speed).
The ideal would be to have 95%+ of the stdlib in python.com
, add a few "highly dependent" third_party extensions (need some way to figure these out) and allow pip
+ compilation from source (via chibicc?!) for any additional packages.
By the way, one question I forgot to answer earlier is that, it is possible to modify the zip store while the APE binary is running, but you need to call OpenExecutable which is a cool hack that's currently only being used by redbean. It currently only works on a couple operating systems and we're working to polyfill it across platforms with a slow copy fallback path. But generally speaking the zip executable, best expectation is to think of it as fast read only access for now.
What would you like to target first?
I want to get as many tests passing as possible first and incorporate them into the build. The Python unit tests are laying bare all the subtle shortcomings in the libc implementation which is great. Then I want to create an HTML page with green / yellow / red boxes in a table that gives people a birds eye overview of how complete Actually Portable Python has progressed. Once we have that we can launch, since it means happy users with realistic expectations and a call to action for anyone interested in helping.
@ahgamut When you say that extension modules are supported, do you mean that they have to be statically linked or that they can be loaded from disk? @jart The time.sleep function doesn't work, it raises an exception.
The tempfile module is broken, at least on Windows.
I wasn't able to build Python.com (make mode=dbg) with asan. The binary wouldn't run when started, just immediately exiting. I did get this error once though. error: could not map asan shadow memory
Can you fix the thing where make will sometimes stop?
I tried building Python.com with Clang and flto to make it faster / smaller: export CFLAGS=-flto export CXXFLAGS=-flto make -j8 MODE=llvm -O o/llvm/third_party/python But the build either kept stopping at some point fo r no reason or giving an error stating that one of the files in the build got too big, I don't have the exact error message.
Would it be possible to use PyPy instead of regular Cpython? I'm not really expecting you to do this but am curious if it would work. Does Cosmopolitan even support JIT code?
Thanks for all your hard work on this project, it's very exciting and I look forward to new stuff!
@Keithcat1
Extension modules are possible, but require a lot of effort right now. Extensions have to be linked statically, and you would need to make some changes in the source code of the extension in order to compile + link it along with libpython.a
.
If you would like to experiment with how extensions are added to python.com
, I would suggest using the my cosmo_py36
fork of CPython which contains all the necessary changes to add extensions statically via python's regular build process.
The repo is at: https://github.com/ahgamut/cpython/tree/cosmo_py36
The outline of the process to add extensions is at: https://github.com/ahgamut/cpython/tree/cosmo_py36#what-about-c-extensions
An example with the greenlet
module and its _greenlet
extension is in this commit: https://github.com/ahgamut/cpython/commit/0274a49934d4f8aa1fc98c10ebf15d694608e4c2
Extension modules are possible, but require a lot of effort right now.
I wish you wouldn't think of it that way. The people who maintain each platform put in a lot of work to create a prescripted path that's specific to their platform for installing python modules. For example, on Debian if you want Numpy you just apt install python-numpy. The tradeoff is that if you take the easy path like that, your program will only run on Debian. We've created a meta-platform of our own, that lets people build Python apps that not only run on every Linux distro, but all the other operating systems too. But since it's new, someone needs to do all that trailblazing which will make things easy for people in the future. We're the ones doing that.
@jart you are right, perhaps calling it "lot of effort" makes it seem unnecessarily daunting. If someone wanted Actually Portable CPython extensions (for e.g. _sqlite
, greenlet
, or markupsafe
), for a few lines' worth of code, Cosmopolitan makes that possible today, which is amazing.
I think that customizing setuptools
(via chibicc, dlopen
, or converting setup.py
to a Makefile recipe) can unlock a more hands-off approach, wherein we would not need to edit the source code at all. Combine that with a polyfilled OpenExecutable
, and many more exciting things will happen.
@jart Would this help for adding shared object support? https://github.com/fancycode/MemoryModule I'm guessing loading shared objects or DLLs using the OS default functions isn't the hard part. Py2EXE actually had functionality that would bundle everything up in one zip file and load extension modules directly using the above C library.
@Keithcat1, I hope so. I already brought MemoryModule up in related discussions in https://github.com/jart/cosmopolitan/issues/97#issuecomment-874415031 and #137. I think the challenge is to not use the OS default functions if we want to be able to load APE-based extension modules.
The challenge isn't implementing the code loading. The challenge is what gets bundled, and what is loadable. cosmopolitan.a is 32mb with debug symbols stripped. It defines 4,000 functions.
So here's what we'd have to do first in order to move forward with a project like dynamic linking. We need an ABI requirements doc. We can study the most popular native extensions. Like numpy. Run readelf or nm to get the undef symbols they need, and make a list.
OK so we now have a pretty nice static analyzer for Python sources, which turns them into ELF objects. https://github.com/jart/cosmopolitan/blob/master/third_party/python/pyobj.c Stuff like this:
Symbol table '.symtab' contains 26 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1
2: 0000000000000000 0 SECTION LOCAL DEFAULT 2
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 8
9: 0000000000000000 0 SECTION LOCAL DEFAULT 9
10: 0000000000000000 0 SECTION LOCAL DEFAULT 10
11: 0000000000000000 0 SECTION LOCAL DEFAULT 11
12: 0000000000000000 52 OBJECT LOCAL DEFAULT 4 zip+lfile:.python/antigravity.py
13: 0000000000000000 294 OBJECT LOCAL DEFAULT 5 zip+cdir:.python/antigravity.py
14: 0000000000000000 53 OBJECT LOCAL DEFAULT 7 zip+lfile:.python/antigravity.pyc
15: 0000000000000000 294 OBJECT LOCAL DEFAULT 8 zip+cdir:.python/antigravity.pyc
16: 0000000000000034 306 OBJECT GLOBAL DEFAULT 4 py:antigravity
17: 0000000000000035 547 OBJECT GLOBAL DEFAULT 7 pyc:antigravity
18: 0000000000000000 0 OBJECT GLOBAL HIDDEN UND pyc:webbrowser
19: 0000000000000000 0 OBJECT GLOBAL HIDDEN UND pyc:hashlib
20: 0000000000000000 0 OBJECT GLOBAL DEFAULT 10 pyc:antigravity.__file__
21: 0000000000000000 0 OBJECT GLOBAL DEFAULT 10 pyc:antigravity.webbrowser
22: 0000000000000000 0 OBJECT GLOBAL DEFAULT 10 pyc:antigravity.hashlib
23: 0000000000000000 0 OBJECT GLOBAL DEFAULT 10 pyc:antigravity.geohash
24: 0000000000000000 0 OBJECT GLOBAL HIDDEN UND .python/
25: 0000000000000000 0 OBJECT GLOBAL HIDDEN UND __zip_start
It works well enough that I'm learning some depressing things about the Python standard library. For example, the help()
function depends on http.server
. Right now the way the static analyzer is set up is we can make weird dependencies like that "weak" in elf terms by putting a try statement around them. That way we can make reasonably small binaries.
Another bit of good news is that I've migrated _hashlib
to MbedTLS!
It seems that connecting to for example localhost:6060 using urllib.requests.urlretrieve doesn't work. urllib.error.URLError: <urlopen error [Errno 10051] ENETUNREACH[10051]>
There appears to be a regression the re module too. These issues will all be resolved as we get the unit tests integrated in the build.
You seem to have broken Python on Windows but not on WSL: Fatal Python error: Py_Initialize: can't initialize sys standard streams
Current thread 0x0000000000000000 (most recent call first): 7770000ffde0 00000069abda __die+0x3e 7770000ffdf0 000000683413 Py_FatalError+0x66 7770000ffe10 000000684088 _Py_InitializeEx_Private+0x414 7770000ffe50 0000006840f7 Py_InitializeEx+0x13 7770000ffe60 00000068410c Py_Initialize+0x13 7770000ffe70 0000004170b7 Py_Main+0x667 7770000fff80 00000040bd79 main+0x1bb 7770000fffe0 000000401890 cosmo+0x40 7770000ffff0 0000006b1f0a _jmpstack+0x17
Also, how does the size-reducing native-module-linkage thing you did work? I know that extension modules are compicated, but is it possible to get CTYPES working relatively easily?
Fatal Python error: Py_Initialize: can't initialize sys standard streams Current thread 0x0000000000000000 (most recent call first): 7770000ffde0 00000069abda __die+0x3e 7770000ffdf0 000000683413 Py_FatalError+0x66 7770000ffe10 000000684088 _Py_InitializeEx_Private+0x414 7770000ffe50 0000006840f7 Py_InitializeEx+0x13 7770000ffe60 00000068410c Py_Initialize+0x13 7770000ffe70 0000004170b7 Py_Main+0x667 7770000fff80 00000040bd79 main+0x1bb 7770000fffe0 000000401890 cosmo+0x40 7770000ffff0 0000006b1f0a _jmpstack+0x17
IIRC, this error is a delayed form of the classic Unable to get the locale encoding
. I've seen this when I was getting _codecs
/encodings
to load on Windows in the first place.
The error at that point was the necessary codec module was not getting imported correctly in get_locale_encoding
during startup, but the interpreter figures out that something is awry and crashes only later, when setting up sys.stdin
/sys.stdout
. Let me see if I can get Windows running and test if this is similar.
Thanks for the reports! That helps narrow things down. I should have a fix pushed shortly.
You asked earlier about how to use the new Python compiler. Right now it's able to be used via the makefile config. Earlier today I added a tutorial to the examples folder showing how an independent package can be configured to build a Python file into a 1.9mb static executable. Please take a look? https://github.com/jart/cosmopolitan/tree/master/examples/pyapp
As for ctypes, it'd be nice if Cosmopolitan supported dynamic linking, because it's frequently requested. However it's something I personally do not want for myself, so it's unlikely that I'll be implementing it myself. The focus has always been doing static linking better than anyone else. Cosmopolitan is sort of like FreeBSD/NetBSD/OpenBSD-style multitenant monorepo.
Any dependencies you need, we can add them to the third_party folder. It might even be possible using GitHub's .gitowners feature for me to simply authorize people to have their own folders within this repo that they can independently maintain. There's also the possibility of setting up a web GUI which could be the easier solution for folks who don't need the full power the make build provides.
OK I think I've finally figured out the optimal strategy for informing the PYOBJ.COM static analyzer about implicit dependencies. Basically it looks like this:
if __name__ == 'PYOBJ.COM':
AF_APPLETALK = 0
AF_ASH = 0
AF_ATMPVC = 0
AF_ATMSVC = 0
AF_AX25 = 0
AF_BRIDGE = 0
AF_CAN = 0
# ....
That way if a module like socket
is doing a weird import star, we can still have it declare its dependencies, while making minimal changes to the upstream sources.
OK I think I've finally figured out the optimal strategy for informing the PYOBJ.COM static analyzer about implicit dependencies. Basically it looks like this:
@jart the above strategy is for statements like from _module import *
. Import star statements use the __all__
attribute of the module to decide which objects to show, or fall back to (?) showing everything.
I am unclear on this: given a from _module import *
statement, does PYOBJ.COM declare a symbol for
_module
(and then its components recursively), or*
(like AF_BRIDGE
, AF_CAN
above, but no recursion)? I will read the source to find out, but an external explanation would also be useful.
Additional note: If we have to declare dependences with an if __name__ == 'PYOBJ'.com
for import star statements, then we can also do the same for the importlib.import_module("module_str")
or __import__("module_str")
hacks which are seen in the Lib/test/test_something.py
files.
Any dependencies you need, we can add them to the third_party folder. It might even be possible using GitHub's .gitowners feature for me to simply authorize people to have their own folders within this repo that they can independently maintain. There's also the possibility of setting up a web GUI which could be the easier solution for folks who don't need the full power the make build provides.
Adding a (pure-python) third_party package is as easy as adding the right folder to /zip/.python
.
For example, I downloaded the plotext
package separately, and manually added it to the APE ZIP store.
This is pure-python package with zero dependencies external dependencies, and it works via the APE on Linux:
No changes to the plotext source were required! This is a good sign, it means @jart's efforts to ensure compatible-with-CPython-yet-better behavior are working. (Windows is still an issue, APE erroring with os.py
expecting posix.F_LOCK
)
I will try to find more such simple packages to test APE compatibility behavior.
I think there are a few more places where Cosmopolitan would need to show compatibility with glibc/CPython (for example Lib/platform.py:_sys_version()
needs a conditional for Cosmopolitan to avoid potential complaints.)
That's amazing @ahgamut. Also good news @Keithcat1! All the issues you brought to our attention should now be resolved by b5f743cdc384044299957bd94bb836eed2de26d4. I've confirmed that the HTTP client works on every single one of the operating systems we support. We now have 148 of Python's unit test programs incorporated into our build too, in order to ensure things stay fixed! That puts us at about 50% of the way there towards full automation.The tests run pretty fast too since they're fully parallelized:
Out of curiosity how long do the Python unit tests take when you run them the normal way? I seem to recall it taking a long time. Nice thing about small static binaries is that it makes the build / test latency 10x faster. It also means that all we need to do in order to test each program is write a script that does:
set -ex
for os in freebsd openbsd netbsd rhel7 rhel5 xnu win7 win10; do
scp test_stat.com $os: &&
ssh $os ./test_stat.com || exit
done
I like to think of it as the maximum consciousness approach to development since it makes life as a developer so much more pleasant for everyone involved over the long term, so we can have more impact focusing on the complexity that matters. So I'm sure Actually Portable Python will be a nice change of pace for anyone who feels like they're trapped in a tooling stack that feels like a messy bachelor's pad.
Anyway I think what we have is novel and definitely good enough for an alpha release. Please feel free to share any ideas on things like a logo, web design, or promotional strategies, since I can imagine a whole lot of people benefiting from this work.
I like to think of it as the maximum consciousness approach to development since it makes life as a developer so much more pleasant for everyone involved over the long term, so we can have more impact focusing on the complexity that matters. So I'm sure Actually Portable Python will be a nice change of pace for anyone who feels like they're trapped in a tooling stack that feels like a messy bachelor's pad.
Indeed, testing/debugging/updating APE Python from the cosmo build system is a breeze compared to what I was doing earlier with the regular Python repo.
Anyway I think what we have is novel and definitely good enough for an alpha release. Please feel free to share any ideas on things like a logo, web design, or promotional strategies, since I can imagine a whole lot of people benefiting from this work.
These are some points I noted down as "possible benefits of porting Python to Cosmopolitan Libc". Perhaps it can help you decide on strategies.
plotext
animation of observing some system tasks (ala Task Manager/htop/blinkenlights) or a nice TUI across systems might be nice examples.pip
and installing arbitrary pure-Python third-party packages into the APE -- I tried this yesterday, but I haven't understood OpenExecutable
clearly. threading
module. Options are: do nothing, spoof threading
with dummy_threading
(ouch), Stackless Python (rejected earlier), spoof via greenlet
(have not tested fully), or any alternate method.MemoryModule
as mentioned by @Keithcat1 and @pkulchenko, adding the extension source ala greenlet
to the Python stdlib in the repo). Would porting (a subset of) Numpy to APE Python be worth it? I found a minimal piconumpy
, but it is too small and really easy to port. Let me see.The reason I wanted CTYPES is because it would allow me to use closed-source libraries like http://www.un4seen.com/bass.html which is good for playing audio but can't be statically linked. It would also make it easier to deal with libraries like SDL2 that might not be compilable with Cosmopolitan for a while. Lastly, it would make it possible for APE Python to just download prebuilt wheels for Windows, Mac and Linux and then load the right DLLs depending on which platform the APE binary is running on.
@jart libc/complex.h
declares functions for use with complex float/complex double, but I get a linker error when trying to use csqrt
.
Are these functions implemented? If not, I'd like to try implementing some. Is there a reference I can use?
@jart bottle.py
is a minimal web framework that may be useful if added to the repo. The package is a single file, and I confirmed that the initial example works if I comment out disable import _thread
in the source file.
What I'd be interested in learning is whatever trick Pyston used to get the 30% performance boost. We obviously wouldn't want to incorporate their JIT work due to what happened with Unladen Swallow, but I seem to recall them writing a blog post where they said Python had some esoteric thing for debugging that no one really uses but made code really slow, so they removed it with great success. I'd like to know what that is.
@jart with #264 and #281 we've disabled all the debugging elements mentioned in the Pyston wiki:
_Py_CheckFunctionResult
PyErr_BadInternalCall
The other areas when Pyston has speedups are either JIT or implementation-defined. Let me see if any of those have equivalents in Cosmopolitan.
Trying some benchmarks might be fun if we have something better than dummy_threading
.
Stack overflow checking is relatively straightforward with Cosmopolitan.
forceinline int Py_EnterRecursiveCall(const char *where) {
const char *rsp, *bot;
extern char ape_stack_vaddr[] __attribute__((__weak__));
if (!IsTiny()) {
rsp = __builtin_frame_address(0);
asm(".weak\tape_stack_vaddr\n\t"
"movabs\t$ape_stack_vaddr+12288,%0" : "=r"(bot));
if (UNLIKELY(rsp < bot)) {
PyErr_Format(PyExc_MemoryError, "Stack overflow%s", where);
return -1;
}
}
return 0;
}
I've added this in 7521bf9e73d340eb36d76f3938eb29e07278ee69.
Are there any simple JIT implementation defined things it's doing that yield big benefits?
Complex math functions like csqrt() need to be ported from Musl. Contributions welcome. Bottle is welcome too.
The reason I wanted CTYPES is because it would allow me to use closed-source libraries like http://www.un4seen.com/bass.html which is good for playing audio but can't be statically linked. It would also make it easier to deal with libraries like SDL2 that might not be compilable with Cosmopolitan for a while. Lastly, it would make it possible for APE Python to just download prebuilt wheels for Windows, Mac and Linux and then load the right DLLs depending on which platform the APE binary is running on.
ctypes isn't able to function as a stopgap since it only works if you're integrating with the platform-specific userspace tooling. Native libraries need to be checked-in to the cosmo third_party directory and linked statically. Otherwise you can interop with external native code via subprocesses, which should serve as a reasonable stopgap.
What we should do is provide an alternative to the platform specific tooling, by integrating chibicc. That way, rather than dropping .so files into your python modules dir you would instead drop .c files, and the Python runtime will compile them and generate bindings automatically, and it would do so in a manner that behaves precisely the same across platforms.
Are there any simple JIT implementation defined things it's doing that yield big benefits?
They've mentioned something like "reducing comparisons for attribute lookups" but I haven't looked at those parts of the source just yet.
Complex math functions like csqrt() need to be ported from Musl. Contributions welcome. Bottle is welcome too.
Bottle is a single file. complex math from musl let me check.
Complex functions are nice to have; the reason I was asking about them is because I've gotten quite close in my attempts to get Numpy into APE Python. Everything compiles, but linker is missing symbols -- probably I've mixed up porting some recipes or I am adding unnecessary files into the Makefile.
pyobj.com
+ Cython outputs a C file to be compiled) or potential Cython+chibicc magic for extensions-DPy_BUILD_CORE
setup.py
that I haven't understood fully (something like if external BLAS, don't compile these files).libblas.a
and libcblas.a
from LAPACK via the Cosmopolitan build system. LAPACK requires libgfortran
though.How much slower could its BLAS subroutines be? LAPACK is a good library, but I'll need to check-in a FORTRAN compiler. We used to have a FORTRAN standard library checked-in to third party. LAPACK is underwhelming compared to MKL which sadly we can't use since it doesn't respect the freedom to hack. From what I hear BLIS is the closest alternative that meets our licensing requirements. https://github.com/flame/blis/blob/master/docs/graphs/large/l3_perf_skx_nt1.pdf So we might want to cherry-pick some of its subroutines to accellerate things like matrix multiplication.
If Cython works by generating a .c
file and handing it over to the gcc
command then we should totally include it. I can update it to integrate with an embedded chibicc runtime once it's ready.
-DPy_BUILD_CORE
I don't like that define because it makes commands like make o//third_party/python/Python/errors.ss
break. I wish it wasn't needed and I've been considering having it be a define at the top of each .c file that needs it, similar to PY_SSIZE_T_CLEAN.
LAPACK is a good library, but I'll need to check-in a FORTRAN compiler.
The vendored gcc executable in the repo has gfortran built-in? I just used the OBJECTIFY.f
rules in build/rules.mk
and it worked.
We used to have a FORTRAN standard library checked-in to third party.
I don't know what functions libgfortran
provides, yesterday the LAPACK build needed string concatenation (there is a // b
string concatenation operationin FORTRAN.
If Cython works by generating a .c file and handing it over to the gcc command then we should totally include it.
Cython just handles the Python -> C conversion (+ it has some additional syntax that help in optimizing said conversion). The generated C file can be used for anything.
@jart the documentation for BLIS says it requires pthreads, even with mulltithreading disabled. pthreads is currently not implemented in Cosmopolitan, so I'll look into this at a later time.
Reading this maybe we should use OpenBLAS with DYNAMIC_ARCH=1 with the exception of MODE=opt since that uses -march=native. https://news.ycombinator.com/item?id=28736136
Related recent discussion on OpenBLAS here.
That discussion is a little handwavy. People who use Linux as a desktop will never see eye-to-eye with people who simply want sweet sweet unadulterated performance that Microsoft and Apple would never in a million years offer, since it makes GUIs slow. Anyway our luck is about to change. See https://github.com/jart/cosmopolitan/pull/282
Cython is a tool that allows creating Python extension modules by converting Pythonic code into a C file which can call between C and Python. Might it be worth allowing to import .pyx (Cython files) which internally converts them to C files and then compiles them? Since it would already be an extension module, no bindings need to be generated since Cython does that already. Also when you say using external subprocesses, do you mean compiling one executable file for each platform that my Cosmo app will run on and then send messages to and from it? It seems clunky, but thinking about it it could be tough to for example get integer sizes right on all platforms.
No only the compiler would run as a subprocess. (At least until we can retool chibicc to not rely on _exit() to free() its memory). It would use the same ABI for all platforms. One compilation. Otherwise it's not build once run anywhere.
@jart one possible way to add many small speedups is to use METH_FASTCALL
instead of METH_VARARGS
when specifying the arguments for a CPython function (a Python function written in C). METH_FASTCALL
prevents creation of an unnecessary Python tuple of args, and instead just uses an array of PyObject *
. The tradeoff is that there is some boilerplate to write to use METH_FASTCALL
properly.
METH_FASTCALL
went through a lot of changes in Python 3.7 (git log -i --grep="FASTCALL"
between Python v3.7.12 and v3.6.15), and is much faster + nicer to use.
METH_FASTCALL
is still considered an internal detail of Python 3.7 -- I hope this means it can be added to APE Python without any major compatibility issues.
Example of manually adding METH_FASTCALL
(both the below commits cannot be ported to the monorepo ATM, because Py_Arg_UnpackStack
and a bunch of other internals need to be moved first):
An automated way of adding METH_FASTCALL
in Python is to use the Argument Clinic generator and generate the necessary clinic/*.inc
files.
I tried using Argument Clinic + METH_FASTCALL
in 3.6 in https://github.com/ahgamut/cpython/commit/2b417a690735b8f004257eddc526ece66dd135b4 for a few methods. The related tests pass, but no noticeable speedup.
https://github.com/ahgamut/python27https://github.com/ahgamut/cpython/tree/cosmo_py27
The
assert
macro needs to be changed incosmopolitan.h
to enable compilation (see #138). Afterwards, just clone the repo and runsuperconfigure
.Python 2.7.18 compiled seamlessly once I figured out how
autoconf
worked, and what flags were being fed to the source files when runningmake
. I'm pretty sure we can compile any C-based extensions intopython.exe
-- they just need to compiled/linked with Cosmopolitan, with necessary glue code added to the Python source. For example, I was able to compile SQLite intopython.exe
to enable the internal_sqlite
module.The compiled APE is about 4.1MB with
MODE=tiny
(without any of the standard modules, the interpreter alone is around 1.6MB). Most of the modules in the stdlib compile without error. The_socket
module (required for Python's simple HTTP server) doesn't compile, as it requires the structs fromnetdb.h
.On Windows, the APE exits immediately because the intertpreter is unable to find the platform-specific files.
Module/getpath.c
andLib/site.py
in the Python source try to use absolute paths from the prefixes provided during compilation; Editing those files to search the right locations (possibly with somezipos
magic) ought to fix this.