Open wash-amzn opened 5 months ago
Are you using a aarch64 host or hosting the aarch64 docker image on a x86_64 machine? We did fix an issue #4939 with GC nursery size on aarch64, perhaps the detection code is still failing on emulation. Can you report the size of the nursery on both the runs (from the comment)
pypy3.9 -c 'import gc; print(gc.get_stats().nursery_size)'
This is on a real aarch64 machine (Graviton3). This happens with or without the improved nursery size.
Given a dummy program
There is a ~5x slowdown on x86 versus ~250x slowdown on aarch64 when using cProfile. The below are using the python 3.10.14/pypy 7.3.16 docker container
Compared to cpython (I used the 3.12.3 container for it) where there is almost no slowdown at all (to be fair, the non-profiled execution time is an order of magnitude higher than with pypy)
For a larger application I was attempting to profile with cProfile, I not only got the slower execution on aarch64, but I also got large differences in the relative sorting of top functions. That might have been legitimate, but I don't feel comfortable trusting the numbers given the large difference in overhead of profiling on x86 vs aarch64.
I don't expect that I will have the time to track this down, so I at least wanted to get this posted in case someone else does.