Closed ahgamut closed 1 year ago
I wonder if Libffi could be used here in some way?
On 10/9/21, Gautham @.***> wrote:
@jart one possible way to add many small speedups is to use
METH_FASTCALL
instead ofMETH_VARARGS
when specifying the arguments for a CPython function (a Python function written in C).METH_FASTCALL
prevents creation of an unnecessary Python tuple of args, and instead just uses an array ofPyObject *
. The tradeoff is that there is some boilerplate to write to useMETH_FASTCALL
properly.
METH_FASTCALL
went through a lot of changes in Python 3.7 (git log -i --grep="FASTCALL"
between Python v3.7.12 and v3.6.15), and is much faster + nicer to use.
METH_FASTCALL
is still considered an internal detail of Python 3.7 -- I hope this means it can be added to APE Python without any major compatibility issues.Example of manually adding
METH_FASTCALL
(both the below commits cannot be ported to the monorepo ATM, becausePy_Arg_UnpackStack
and a bunch of other internals need to be moved first):
- @.***
- @.***
An automated way of adding
METH_FASTCALL
in Python is to use the Argument Clinic generator and generate the necessaryclinic/*.inc
files.I tried using Argument Clinic +
METH_FASTCALL
in 3.6 in https://github.com/ahgamut/cpython/commit/2b417a690735b8f004257eddc526ece66dd135b4 for a few methods. The related tests pass, but no noticeable speedup.-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/jart/cosmopolitan/issues/141#issuecomment-939307979
I found a weird bug when running APE Python on Windows.
Create a file bug.py:
import nonexistant_module
then from the command line:
python.com bug.py
Traceback (most recent call last):
File "t.py", line 1, in
@Keithcat1, this looks very similar to the frame unwinding issue I came across while compiling LuaJIT (also on windows). It doesn't happen for @ahgamut, but it does happen for me on both Windows and WSL2, so maybe related to the Windows platform. I found a "fix", but it's not really fixing the underlying issue.
Trying some benchmarks might be fun
I compared APE Python to Python 3.6.15 by using pyperformance benchmark test suite. I compiled Python 3.6.15 with -O3
, so I used MODE=opt
for the APE.
Setting up pyperformance
to test the APE requires some manual setup: pyperformance
uses subprocess
to call the Python executable, so because of #120 I had to wrap the APE in a simple shell script or change the subprocess
call (basically change python.com $@
to sh python.com $@
). I think this affects measuring the performance of the APE. Can we have a build mode (MODE=optlinux
) that produces optimized ELF executables (or some other workaround) so that #120 doesn't happen?
@jart some important notes:
MODE=opt
is about as fast as regular Python 3.6.15 (table summary says 1.00x slower) within the context of this benchmark. It is likely I am running the benchmark improperly, because some of the results are unexpected (APE startup time is 1.1x slower?!).--enable-optimizations
(LTO/profile-guided as mentioned in #243 ) play a decent role in the pyperformance benchmark. I benchmarked Anaconda Python 3.6.13, which provides LTO/PGO optimized binaries, and it currently outperforms the APE (1.1x faster).METH_FASTCALL
earlier; there are similar things that may be possible to port. However, porting without breaking compatibility would need some examination.Here's the raw table numbers for reference:
Benchmark | py36-15 | APE-python |
---|---|---|
pickle_dict | 57.5 us | 41.2 us: 1.39x faster |
pickle_list | 8.10 us | 6.23 us: 1.30x faster |
pickle | 20.1 us | 18.0 us: 1.11x faster |
json_loads | 51.5 us | 48.7 us: 1.06x faster |
pathlib | 39.5 ms | 37.7 ms: 1.05x faster |
unpickle | 28.2 us | 27.0 us: 1.04x faster |
raytrace | 1.23 sec | 1.18 sec: 1.04x faster |
unpickle_pure_python | 796 us | 772 us: 1.03x faster |
go | 555 ms | 538 ms: 1.03x faster |
pickle_pure_python | 1.07 ms | 1.04 ms: 1.03x faster |
deltablue | 17.2 ms | 16.8 ms: 1.02x faster |
scimark_sor | 466 ms | 459 ms: 1.02x faster |
regex_compile | 384 ms | 379 ms: 1.01x faster |
xml_etree_parse | 277 ms | 273 ms: 1.01x faster |
telco | 14.3 ms | 14.1 ms: 1.01x faster |
xml_etree_iterparse | 222 ms | 220 ms: 1.01x faster |
xml_etree_generate | 223 ms | 221 ms: 1.01x faster |
float | 228 ms | 229 ms: 1.00x slower |
meteor_contest | 199 ms | 200 ms: 1.01x slower |
hexiom | 22.7 ms | 22.9 ms: 1.01x slower |
pidigits | 293 ms | 295 ms: 1.01x slower |
nqueens | 206 ms | 207 ms: 1.01x slower |
scimark_sparse_mat_mult | 7.57 ms | 7.64 ms: 1.01x slower |
regex_effbot | 4.97 ms | 5.02 ms: 1.01x slower |
scimark_monte_carlo | 222 ms | 225 ms: 1.01x slower |
logging_silent | 745 ns | 755 ns: 1.01x slower |
regex_v8 | 43.1 ms | 43.7 ms: 1.01x slower |
xml_etree_process | 178 ms | 181 ms: 1.02x slower |
chaos | 247 ms | 255 ms: 1.04x slower |
pyflate | 1.45 sec | 1.51 sec: 1.04x slower |
fannkuch | 997 ms | 1.04 sec: 1.04x slower |
crypto_pyaes | 214 ms | 224 ms: 1.05x slower |
regex_dna | 281 ms | 294 ms: 1.05x slower |
scimark_fft | 624 ms | 653 ms: 1.05x slower |
unpickle_list | 7.27 us | 7.72 us: 1.06x slower |
nbody | 245 ms | 261 ms: 1.07x slower |
unpack_sequence | 85.1 ns | 90.7 ns: 1.07x slower |
spectral_norm | 254 ms | 273 ms: 1.07x slower |
python_startup | 17.3 ms | 18.9 ms: 1.09x slower |
python_startup_no_site | 10.8 ms | 12.0 ms: 1.11x slower |
json_dumps | 24.7 ms | 27.8 ms: 1.13x slower |
logging_simple | 20.4 us | 23.7 us: 1.16x slower |
logging_format | 23.7 us | 27.5 us: 1.16x slower |
Geometric mean | (ref) | 1.00x slower |
Benchmark hidden because not significant (2): scimark_lu, richards Ignored benchmarks (1) of py36-15.json: 2to3 Ignored benchmarks (1) of APE-python.json: sqlite_synth
Here are the steps to obtain the above benchmark measurement:
pyperformance
installed. Let's call this the <marker>
Python. Note down the location of <marker>/bin/python
, it will also contain <marker>/bin/pyperformance
.member
Python versions (including APE builds) and store them separately.pyperf
and pyaes
for each member
Python. For the APE builds, this means byte-compiling and adding the source code of these packages into .python/
in the APE ZIP store.#!/usr/bin/env sh
location/of/the/APE/python-opt.com $@
Alternatively, modify run.py
in pyperformance
to add sh
when dealing with APEs:
def run_command(command, hide_stderr=True):
if hide_stderr:
kw = {'stderr': subprocess.PIPE}
else:
kw = {}
# above code is unchanged
if ".com" in command[0]: # add these two lines
command.insert(0, "sh")
# below code is unchanged
member
Python that you want to benchmark, run<marker>/bin/pyperformance run --python=/location/of/member/python --inside-venv -o member-performance.json
<marker>/bin/pyperf
:
<marker>/bin/pyperf compare_to py36-15.json APE-python.json --table --table-format md -G
Oh wow we're the fastest at pickling.
MODE=optlinux
?audioop.add()
so we can lay claim to 10x improvements?The new optlinux
mode should give you the boost you were expecting, because it enables red zone and disables frame pointers. For example, here's Musl Libc:
$ time python3.6 -m test.test_pickle
real 0m2.730s
user 0m2.655s
sys 0m0.074s
Here's MODE=opt
:
$ time o/opt/third_party/python/pythontester.com -m test.test_pickle
real 0m2.535s
user 0m2.455s
sys 0m0.080s
Here's MODE=optlinux
:
$ time o/optlinux/third_party/python/pythontester.com -m test.test_pickle
real 0m2.353s
user 0m2.270s
sys 0m0.082s
Please be advised that disabling frame pointers isn't worth the performance gain in practice, since it takes away things like backtraces. Speaking of which, @pkulchenko we now have more solid code for stack unwinding and such, which shouldn't ever crash. The recommended algorithm is this:
size_t gi = __garbage->i;
const struct StackFrame *frame = __builtin_frame_address(0);
for (frame = bp; frame; frame = frame->next) {
if (!IsValidStackFramePointer(frame)) {
__printf("%p corrupt frame pointer\n", frame);
break;
}
addr = frame->addr;
if (addr == (intptr_t)&__gc) {
do --gi;
while ((addr = garbage->p[gi].ret) == (intptr_t)&__gc);
}
__printf("%p %p %s\n", frame, addr, __get_symbol_by_addr(addr));
}
If our Python takes 10% longer to startup then that's impressive if you consider APE binaries are shell scripts and it has to DEFLATE all .pyc / .py files it loads from the ZIP structure. APE loading is going to be more important as apps grow larger, because bloated Python code spends most of its time crawling the filesystem if you strace it. With the ZIP central directory we can make that an effectively O(1) pure userspace operation. Right now the LIBC_ZIPOS code isn't as optimized as it can be, but to give you a basic idea of how important a boost this can be, consider this benchmark:
* stat syscall l: 892๐ 288๐๐ m: 1,024๐ 331๐๐
* stat() fs l: 915๐ 296๐๐ m: 981๐ 317๐๐
* stat() zipos l: 135๐ 44๐๐ m: 169๐ 55๐๐
1,732โฐ 1,109โณ 1,100k 0iop o/rel/test/libc/calls/stat_test.com -b
This is basically what Java does because enterprise apps are always humongous, so having .jar files designed as .zip files was a really brilliant move on Java's part that allowed it to meet the requirements of big companies. So in many ways you could think of Actually Portable Python as an Enterprise Python even though we're a scrappy open source project. There should ideally be some way for the benchmarks to capture that, once we optimize it more. I also spotted an issue that helps us save 100 cycles on read()
and write()
function calls. So i/o should be a little bit better now.
PGO is something I want. I looked into doing it and came to the conclusion that I'd prefer to kick that can down the road. We can do what PGO does manually for the time being, by using the COUNTBRANCH() macro. You can wrap any expression with that, and it'll cause the linker to generate code that prints the percentage of times the branch was taken at the end of the program. If it ends up being 99% yes or 1% no then you would then add LIKELY() or UNLIKELY() macros around the expression. It makes a measurable difference because the Python codebase is written in such a way that error handling code usually blocks the critical path, and everything counts in small amounts.
Actually Portable Python under MODE=optlinux
is slightly (1.02x) faster than vanilla Python 3.6.15 on the pyperformance benchmark! MODE=optlinux
startup times are almost equivalent.
Benchmark | py36-15 | APE-optlinux |
---|---|---|
pickle_dict | 57.5 us | 39.6 us: 1.45x faster |
pickle_list | 8.10 us | 6.21 us: 1.30x faster |
pickle | 20.1 us | 17.2 us: 1.16x faster |
unpickle | 28.2 us | 25.8 us: 1.09x faster |
scimark_sparse_mat_mult | 7.57 ms | 6.98 ms: 1.08x faster |
logging_silent | 745 ns | 691 ns: 1.08x faster |
telco | 14.3 ms | 13.2 ms: 1.08x faster |
pickle_pure_python | 1.07 ms | 995 us: 1.08x faster |
raytrace | 1.23 sec | 1.14 sec: 1.08x faster |
json_loads | 51.5 us | 47.9 us: 1.07x faster |
xml_etree_generate | 223 ms | 208 ms: 1.07x faster |
unpickle_pure_python | 796 us | 746 us: 1.07x faster |
pathlib | 39.5 ms | 37.0 ms: 1.07x faster |
regex_compile | 384 ms | 363 ms: 1.06x faster |
go | 555 ms | 526 ms: 1.06x faster |
richards | 171 ms | 163 ms: 1.05x faster |
xml_etree_parse | 277 ms | 264 ms: 1.05x faster |
deltablue | 17.2 ms | 16.4 ms: 1.05x faster |
xml_etree_iterparse | 222 ms | 215 ms: 1.04x faster |
nqueens | 206 ms | 199 ms: 1.03x faster |
python_startup_no_site | 10.8 ms | 10.5 ms: 1.03x faster |
float | 228 ms | 222 ms: 1.03x faster |
unpickle_list | 7.27 us | 7.11 us: 1.02x faster |
scimark_sor | 466 ms | 458 ms: 1.02x faster |
scimark_lu | 486 ms | 478 ms: 1.02x faster |
chaos | 247 ms | 242 ms: 1.02x faster |
hexiom | 22.7 ms | 22.4 ms: 1.02x faster |
scimark_monte_carlo | 222 ms | 219 ms: 1.02x faster |
xml_etree_process | 178 ms | 175 ms: 1.02x faster |
crypto_pyaes | 214 ms | 212 ms: 1.01x faster |
scimark_fft | 624 ms | 619 ms: 1.01x faster |
pidigits | 293 ms | 293 ms: 1.00x faster |
regex_effbot | 4.97 ms | 5.00 ms: 1.01x slower |
python_startup | 17.3 ms | 17.4 ms: 1.01x slower |
regex_v8 | 43.1 ms | 43.4 ms: 1.01x slower |
spectral_norm | 254 ms | 258 ms: 1.02x slower |
nbody | 245 ms | 249 ms: 1.02x slower |
regex_dna | 281 ms | 288 ms: 1.02x slower |
unpack_sequence | 85.1 ns | 91.3 ns: 1.07x slower |
json_dumps | 24.7 ms | 27.0 ms: 1.09x slower |
logging_format | 23.7 us | 27.1 us: 1.14x slower |
logging_simple | 20.4 us | 23.3 us: 1.14x slower |
fannkuch | 997 ms | 1.64 sec: 1.64x slower |
Geometric mean | (ref) | 1.02x faster |
Benchmark hidden because not significant (2): meteor_contest, pyflate Ignored benchmarks (1) of py36-15.json: 2to3 Ignored benchmarks (1) of APE-optlinux.json: sqlite_synth
Could we add a benchmark for audioop.add() so we can lay claim to 10x improvements?
Pyston is looking to add custom benchmark suites to pyperformance
(https://github.com/python/pyperformance/pull/109) , so if we have a bunch of benchmark scripts, we could do something similar.
PGO is something I want. I looked into doing it and came to the conclusion that I'd prefer to kick that can down the road. We can do what PGO does manually for the time being, by using the COUNTBRANCH() macro. You can wrap any expression with that, and it'll cause the linker to generate code that prints the percentage of times the branch was taken at the end of the program. If it ends up being 99% yes or 1% no then you would then add LIKELY() or UNLIKELY() macros around the expression. It makes a measurable difference because the Python codebase is written in such a way that error handling code usually blocks the critical path, and everything counts in small amounts.
IIRC there are many functions with if(condition) goto error;
blocks which may benefit from the UNLIKELY
macro. Let me try it out. Can we try just LTO to see if it has any benefits?
Also, Python3.6 is the last of the "slow" Python3 versions (will reach EOL in two months). From Python 3.7 onwards they've focused on improving performance: git log --grep
between 3.7 and 3.6 shows a bunch of PRs with "performance/faster/speedup" in the description.
It seems that compiling Python in MODE=rel no longer excludes the .py files, the binary is about the same size as the Python binary built in the default mode.
And here's a build break that I've been running into. It seems to occur reguardless of the mode, but it doesn't stop you from just running the make again for some reason.
error:o/build/bootstrap/zipobj.com.tmp.13158: check failed: 0xffffffffffffffff != 0xffffffffffffffff (2) 7ffeaf549660 0000004018b5 NULL 7ffeaf549690 000000403b37 NULL 7ffeaf5496d0 0000004038d3 NULL 7ffeaf549700 0000004013fa NULL 7ffeaf549710 0000004015a3 NULL 7ffeaf549720 00000040116a NULL
make MODE=dbg -j4 o/dbg/third_party/python/Lib/venv/scripts/nt/Activate.ps1.zip.o
exited with 77:
build/bootstrap/zipobj.com -b0x400000 -P.python -C3 -o o/dbg/third_party/python/Lib/venv/scripts/nt/Activate.ps1.zip.o third_party/python/Lib/venv/scripts/nt/Activate.p
s1
consumed 126,282ยตs wall time
ballooned to 2,100kb in size
needed 15,829us cpu (43% kernel)
caused 597 page faults (99% memcpy)
208 context switch (97% consensual)
performed 480 read and 0 write i/o operations
make: *** [build/rules.mk:78: o/dbg/third_party/python/Lib/venv/scripts/nt/Activate.ps1.zip.o] Error 77
@jart auto-complete interferes with indentation in the REPL:
>>> def function(x):
>>> return x+1 # have to press 4 spaces here, because <tab> auto-completes to __name__
That sounds like an easy fix. I'll probably have to change the linenoise api to do it, but it wouldn't need to be a breaking change.
@jart Regarding the documentation for APE Python: since the CPython docs use sphinx, I thought it would be easier if sphinx was available.
Here are the steps to get there:
Requirements:
python.com
containing markupsafe; use this fork: https://github.com/ahgamut/cosmopolitan/tree/python-stdlibpython3.6
with pip
(preferably 3.6, but 3.7 or 3.8 should also work) to download sphinx
ls # current dir should contain python.com from above
wget https://www.python.org/ftp/python/3.6.14/Python-3.6.14.tgz
tar -xzf Python-3.6.14.tgz
Download sphinx and its dependencies locally (markupsafe is the only package that has a C extension):
mkdir -p ./.python/
# distutils needed because packages import it somewhere
cp -r ./Python-3.6.14/Lib/distutils/ ./.python/
# Py36 docs builds only with sphinx < 4.0 due to some deprecations
python3 -m pip install "sphinx<4.0" -t ./.python/
# the APE already has markupsafe, so remove
rm ./.python/markupsafe*
rm ./.python/MarkupSafe*
patch jinja2 and sphinx to avoid weird errors:
diff --recursive '--exclude=*.pyc' ./site-packages/jinja2/debug.py ./.python/jinja2/debug.py
203c203
< elif platform.python_implementation() == "PyPy":
---
> elif True or platform.python_implementation() == "PyPy":
diff --recursive '--exclude=*.pyc' ./site-packages/sphinx/jinja2glue.py ./.python/sphinx/jinja2glue.py
121a122,124
> zz = "/zip/.python/sphinx/themes"
> if zz not in self.searchpath:
> self.searchpath.append(zz)
add the downloaded libs to python.com
:
./python.com -m compileall -b ./.python/
zip -qr ./python.com ./.python/
cp python.com ./Python-3.6.14/Doc/
patch the Doc
folder of CPython-3.6.14 to use python.com
diff --git a/Doc/Makefile b/Doc/Makefile
index efd31a9c79..3078b2e9ff 100644
--- a/Doc/Makefile
+++ b/Doc/Makefile
@@ -4,10 +4,10 @@
#
# You can set these variables from the command line.
-PYTHON = python3
+PYTHON = ./python.com
VENVDIR = ./venv
-SPHINXBUILD = PATH=$(VENVDIR)/bin:$$PATH sphinx-build
+SPHINXBUILD = ./python.com -m sphinx.cmd.build
PAPER =
SOURCES =
DISTVERSION = $(shell $(PYTHON) tools/extensions/patchlevel.py)
diff --git a/Doc/conf.py b/Doc/conf.py
index fab963e23a..7f8076c748 100644
--- a/Doc/conf.py
+++ b/Doc/conf.py
@@ -7,7 +7,7 @@
# that aren't pickleable (module imports are okay, they're removed automatically).
import sys, os, time
-sys.path.append(os.path.abspath('tools/extensions'))
+sys.path.append(os.path.abspath('./tools/extensions'))
# General configuration
# ---------------------
diff --git a/Doc/tools/extensions/pyspecific.py b/Doc/tools/extensions/pyspecific.py
index 70bdd17542..c4831d1b12 100644
--- a/Doc/tools/extensions/pyspecific.py
+++ b/Doc/tools/extensions/pyspecific.py
@@ -231,7 +231,7 @@ class DeprecatedRemoved(Directive):
translatable=False)
node.append(para)
env = self.state.document.settings.env
env.note_versionchange('deprecated', version[0], node, self.lineno)
return [node] + messages
run make html
.
Sample screenshot:
Turns out Python 3.9 can be built with Cosmopolitan Libc if you just ifdef out pthreads: https://github.com/ahgamut/cpython/tree/cosmo_py39
Careful: It's not as polished as the python.com
in the repo: I haven't tested how threads work in the above APE (_threadmodule.c
compiled without any warnings though).
Python 3.9 APE can run without needing pthreads, similar to Python 3.6.
WITH_THREAD
to ifdef out pthread requirements at compile time._dummy_thread.py
logic at runtime so that the threading
module doesn't complain.The only difference is that _dummy_thread.py
has to be written at the C level in Py39, and some multithreading-related tests will fail because the default Py39 build expects threads to be available.
Screenshot of python39.com -m http.server
built from here:
http.server
requires threads in Python 3.9, but I used WITH_THREAD
for _dummy_thread
-like behavior when starting a new thread, so it works.
The BLIS Linear Algebra Library builds with Cosmopolitan Libc under the generic
target.
Now numpy is just a few build configs away :)
https://github.com/ahgamut/blis/tree/cosmopolitan (run superconfigure
, and then run the examples in examples/oapi
)
The library builds and works on the examples without linking pthreads -- this is the second time my believing the documentation has led me astray.
@jart the constants in libc/sysv/consts/baud.h
clash with some macros in BLIS.
That's great news! Send a PR? Let's fix those clashes too.
As for threads, we're slowly but surely making progress on that front. Malloc and other stuff has been made thread safe.
BLIS now builds under the penryn
microarchitecture (aka Intel Core2, as specified here) with Cosmopolitan Libc, but requires removal of -fno-omit-frame-pointer
and -pg
.
The testsuite
passes a lot of tests, but my system OOMs (>10GiB) before all of them complete.
I got a build error while trying:
make -j4 -O o//third_party/python/python.com
The important line seems to be:
error: "usr/share/ssl/root/" (o//net/https/sslroots.o) not defined by direct deps of o//net/https/https.a.pkg
Let me know if you need all of it.
@Keithcat1 I think I've seen this one before. Can you try building make -j4 o//net/https
first and then try make -j4 o//third_party/python/python.com
?
Delete the o/usr
folder before trying this. If you see the backtrace even with make -j4 o//net/https
please post the backtrace here.
@jart I think this is the error where the zipobj.com
doesn't write the symbol table entry correctly. For reference, see below
readelf -Wa o//usr/share/ssl/root/amazon.pem.zip.o
shows (note relative path missing):
Symbol table '.symtab' contains 13 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1
2: 0000000000000000 0 SECTION LOCAL DEFAULT 2
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 8
9: 0000000000000000 40 OBJECT LOCAL DEFAULT 4 zip+lfile:amazon.pem
10: 0000000000000000 294 OBJECT LOCAL DEFAULT 5 zip+cdir:amazon.pem
11: 0000000000000028 2692 OBJECT GLOBAL DEFAULT 4 amazon.pem
12: 0000000000000000 0 OBJECT GLOBAL HIDDEN UND __zip_start
My guess is that the StripComponents
function is doing something weird, but I have not been able to trigger the error in a defined manner yet.
It worked. Thank you.
My next bug is that I did:
make -j4 MODE=rel -O o/rel/third_party/python/python.com
and it did not remove all the .py files.
BLIS now builds under the penryn microarchitecture (aka Intel Core2, as specified [here (https://github.com/flame/blis/blob/master/docs/HardwareSupport.md)) with Cosmopolitan Libc, but requires removal of -fno-omit-frame-pointer and -pg.
That's perfectly fine. We don't need function call tracing for performance critical code like BLIS. It's perfectly safe to disable that. Provided you don't need the rich debugging.
My guess is that the StripComponents function is doing something weird, but I have not been able to trigger the error in a defined manner yet.
Please keep an eye on that and let me know. If you can help me understand what's going on then I'll surely fix it.
If you do:
bytearray(12**10)
I get this:
die failed while dying
Where CPython would throw a MemoryError, because I asume that of course it ran out of memory.
@Keithcat1 python.com
built with MODE=
from commit e4d6e263d4c2161d does throw a MemoryError
for the example you mentioned
Traceback (most recent call last):
File "sample.py", line 1, in <module>
a = bytearray(12**10)
MemoryError
I tried that and now it just exits without any message whatsoever, like this:
C:\py>cd git
C:\py\git>rd /s /q cosmopolitan
C:\py\git>wsl
[sudo] password for keith:
keith@keith-pc:/mnt/c/py/git$ git clone https://github.com/jart/cosmopolitan
Cloning into 'cosmopolitan'...
remote: Enumerating objects: 76289, done.
remote: Counting objects: 100% (1502/1502), done.
remote: Compressing objects: 100% (667/667), done.
remote: Total 76289 (delta 984), reused 1126 (delta 832), pack-reused 74787
Receiving objects: 100% (76289/76289), 104.96 MiB | 2.75 MiB/s, done.
Resolving deltas: 100% (41558/41558), done.
Updating files: 100% (19501/19501), done.
keith@keith-pc:/mnt/c/py/git$ cd cosmopolitan
keith@keith-pc:/mnt/c/py/git/cosmopolitan$
keith@keith-pc:/mnt/c/py/git/cosmopolitan$ git checkout e4d6e26
Note: switching to 'e4d6e26'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at e4d6e263d Rename ParseJson() to DecodeJson() for consistency
keith@keith-pc:/mnt/c/py/git/cosmopolitan$ make -j1 MODE= -O o//net/https
...
keith@keith-pc:/mnt/c/py/git/cosmopolitan$ make -j1 MODE= -O o//third_party/python/python.com
...
keith@keith-pc:/mnt/c/py/git/cosmopolitan$ exit
logout
C:\py\git>cd cosmopolitan
C:\py\git\cosmopolitan>o\third_party\python\python.com
Python 3.6.14+ (Actually Portable Python) [GCC 9.2.0] on cosmo
Type "help", "copyright", "credits" or "license" for more information.
>>: q=bytearray(12**10)
C:\py\git\cosmopolitan>o\third_party\python\python.com
Python 3.6.14+ (Actually Portable Python) [GCC 9.2.0] on cosmo
Type "help", "copyright", "credits" or "license" for more information.
>>: bytearray(12**10)
When run under WSL, python.com does throw on a MemoryError. On Windows?
Python 3.6.14+ (Actually Portable Python) [GCC 9.2.0] on cosmo
Type "help", "copyright", "credits" or "license" for more information.
>>: q=bytearray(12**10)
7000001fee50 000001791a7f OnUnrecoverableMmapError+31
7000001fee70 00000049c326 MapMemories+1030
7000001fef00 000001793ac4 Mmap+1108
7000001fef90 00000179401c mmap+76
7000001feff0 00000178c2f3 dlmalloc_requires_more_vespene_gas+67
7000001ff010 000001781242 mmap_alloc.constprop.0+98
7000001ff070 000001785f8a sys_alloc.constprop.0+474
7000001ff0d0 000001786c08 dlmalloc+504
7000001ff140 00000178c0c8 dlmemalign+40
7000001ff150 00000186ade4 __asan_allocate+100
7000001ff1a0 00000186afcb __asan_memalign+139
7000001ff210 0000010b5513 _PyMem_RawMalloc+19
7000001ff220 0000010b6e12 _PyMem_DebugRawAlloc+658
7000001ff280 0000010b74aa _PyMem_DebugRawRealloc+842
7000001ff2d0 0000010b771e _PyMem_DebugRealloc+14
7000001ff2e0 0000010b8701 PyObject_Realloc+33
7000001ff2f0 000000eefe97 PyByteArray_Resize+967
7000001ff360 000000ef13fc bytearray_init+1564
7000001ff510 000001145a01 type_call+433
7000001ff560 000000f471cb _PyObject_FastCallKeywords+731
7000001ff5b0 0000013f43cd _PyEval_EvalFrameDefault+55341
7000001ff8a0 0000013e2706 _PyEval_EvalCodeWithName+10998
7000001ff9a0 0000013e3f14 PyEval_EvalCodeEx+100
7000001ffa50 0000013e3f74 PyEval_EvalCode+36
7000001ffa90 00000153d100 run_mod+64
7000001ffac0 0000015474d1 PyRun_InteractiveOneObjectEx+1265
7000001ffb80 0000015480a4 PyRun_InteractiveLoopFlags+212
7000001ffc10 000001549563 PyRun_AnyFileExFlags+67
7000001ffc40 000000840408 run_file+504
7000001ffc90 000000842438 Py_Main+6696
7000001ffe40 0000004c2fba RunPythonModule+1306
7000001fff20 000000402d44 main+148
7000001fffd0 00000040ab26 cosmo+71
7000001fffe0 000001795585 _jmpstack+22
@Keithcat1 I'm just guessing here, but does regular This is not likely to be the cause.stderr
work on Windows? Try print("hi", file=sys.stderr)
in python.com
-- if it doesn't print anything then its probably at the python level and we can change that in pythonrun.c
or wherever the streams are initialized.
@jart do the backtraces require the .com.dbg
to be present? Perhaps this missing error log is related to that.
If you're using WSL then please confirm that this same issue happens when binfmt_misc is disabled (sudo sh -c 'echo -1 >/proc/sys/fs/binfmt_misc/status'
) just so we know that it isn't accidentally running in the WIN32 environment.
@jart I tried that but it didn't seem to change anything, still got a MemoryError both times.
print("Hi!", file = sys.stderr)
also worked fine.
Changed the name of the issue because we have 4 ports of Python to Cosmopolitan Libc:
o//third_party/python
python.com
is multi-threaded :rocket:, using Cosmopolitan Libc's pthreads implementation.@ahgamut That sounds amazing! Would it be possible for you to maybe provide a binary of one or two of them, just to make it easier for people to check it out and get excited?
(On a mostly unrelated note: The Discord link in https://ahgamut.github.io/2021/07/13/ape-python/ seems to have expired)
I support @ahgamut distributing Actually Portable Python binaries on his blog. We're already doing release binaries for Actually Portable Perl. Binary releases are hard to pull off gracefully and I think @G4Vi did a great job with that.
@stefnotch if you want an Actually Portable Python binary to hold you over in the meantime, there's a link to a python.com
binary in this blog post http://justine.lol/ftrace/ which you may download. It's an authentic build of Cosmopolitan's Python 3.6 under third party.
We do have a Discord and anyone reading is welcome to join: https://discord.gg/vFdkMdQN Please note this link expires in seven days. You can email jtunney@gmail.com if you need another one.
Actually Portable Python (CPython 3.11.4) binaries are available here: https://github.com/ahgamut/superconfigure/releases/tag/z0.0.3
They do seem to work on Windows, but you mgiht have to run them from the command line and not explorer. Also it erases the entire line every time I press backspace or plays the system sound for an inpalid keypress, I think it's called if there's nothing to delete.
I was able to reproduce compiling python.com on my machine from the superconfigure repo.
@ahgamut Could this issue be closed now?
Very well, closing. we can re-open if there are any new major issues with building CPython.
If anyone wants to try out a CPython3.11 Actually Portable Executable, you can download one from here: https://github.com/ahgamut/superconfigure/releases/tag/z0.0.24
I am trying to compile python.com using the cpython cosmo_py311 branch:
https://github.com/ahgamut/cpython/tree/cosmo_py311
After following the instructions and running ./superconfigure i get the following error message:
checking whether we are cross compiling... configure: error: in /home/eirik/code_dir/cpython': configure: error: cannot run C compiled programs. If you meant to cross compile, use
--host'.
See `config.log' for more details
I also attach the config.log:
Do you know what might be causing this issue?
PS:
gcc --version returns
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
@EirikJaccheri I would say the cosmo_py311
is outdated at this point, due to all the improvements with Cosmopolitan Libc (most notably the cosmocc
toolchain that uses a patched gcc-11
binary, and the apelink
to produce fat binaries).
If you'd like to build CPython3.11 with Cosmopolitan Libc from source, I'd recommend trying out my superconfigure
repo https://github.com/ahgamut/superconfigure. If you just want a python
binary that's built with Cosmopolitan Libc, you can get it from the releases of that repo. If you're trying to build some specific Python packages, let me know what you have in mind.
Hi, Thank you for your quick response:-)
The reason why i tried to use the cosmo_py311 is that there seemed to be support to include C libraries in python.com (specifically i would like to include numpy, clickhouse_connect, pandas, datetime, toml, sys and time)
In the superconfigure repo i got the impression that one could only include pure python libraries. Am i wrong? Is it possible to add these libraries? Eirik
the builds in superconfigure also provide C extensions (notably markupsafe
and PyYAML
). Someone experienced with setuptools
/pip
internals could do something wonderful at this stage.
I'm trying to figure out a nice way to package numpy
, I'll post another build on superconfigure
once I figure it out.
Now that Cosmopolitan supports dlopen, can Ctypes be made to work? Mostly curious.
ctypes
is something I could imagine working.
Hi again,
I figured out that numpy is not a dependency of clickhouse_connect. To get clickhouse_connect to work i only need two libraries which use C-extensions: zstandard and lz4 (https://pypi.org/project/zstandard/#description and https://pypi.org/project/lz4/#files).
@ahgamut Is it possible to build these packages using superconfigure? If so, how?
Eirik
ok, seems like it it can be done, with the following steps:
xz
or gzip
in https://github.com/ahgamut/superconfigure/tree/main/compressModules/Setup
recipe for it, similar to yaml
@ahgamut superconfigure
looks awesome! Thank you for the hard work.
Do you have any plans to add a python single executable cross compiler? Something like pyinstaller
or nuitka
?
I'm pretty happy building python via superconfigure
for now -- using cosmocc
as my cross-compiler and the scripts in superconfigure
get me to a single python-executable for my uses.
Interesting. Are you saying you can already make a single executable of the python app + the python runtime with cosmocc
?
If so do you mind pointing me to the piece of code that does it and I can try myself?
https://github.com/ahgamut/python27https://github.com/ahgamut/cpython/tree/cosmo_py27
The
assert
macro needs to be changed incosmopolitan.h
to enable compilation (see #138). Afterwards, just clone the repo and runsuperconfigure
.Python 2.7.18 compiled seamlessly once I figured out how
autoconf
worked, and what flags were being fed to the source files when runningmake
. I'm pretty sure we can compile any C-based extensions intopython.exe
-- they just need to compiled/linked with Cosmopolitan, with necessary glue code added to the Python source. For example, I was able to compile SQLite intopython.exe
to enable the internal_sqlite
module.The compiled APE is about 4.1MB with
MODE=tiny
(without any of the standard modules, the interpreter alone is around 1.6MB). Most of the modules in the stdlib compile without error. The_socket
module (required for Python's simple HTTP server) doesn't compile, as it requires the structs fromnetdb.h
.On Windows, the APE exits immediately because the intertpreter is unable to find the platform-specific files.
Module/getpath.c
andLib/site.py
in the Python source try to use absolute paths from the prefixes provided during compilation; Editing those files to search the right locations (possibly with somezipos
magic) ought to fix this.