Profile Guided Optimization improvements (better training, llvm support, etc)

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

BPO	24915
Nosy	@smontanaro, @brettcannon, @gpshead, @pitrou, @scoder, @vstinner, @larryhastings, @skrah
Dependencies	bpo-25188: regrtest.py improvement for Profile Guided Optimization builds
Files	python2.7-pgo.patch: PGO patch for Python2.7 python3.6-pgo.patch: PGO patch for Python3.6 python2.7-pgo-v02.patch python3.6-pgo-v02.patch python2.7-pgo-v02-mac.patch python3.6-pgo-v02-mac.patch python2.7-pgo-v03.patch python3.6-pgo-v03.patch python2.7-pgo-v04.patch python3.6-pgo-v04.patch python2.7-pgo-v05.patch python3.6-pgo-v05.patch python2.7-pgo-v06.patch python3.6-pgo-v06.patch README.pgo README2.7-pgo-v01.patch README3.6-pgo-v01.patch README2.7-pgo-v02.patch README3.6-pgo-v02.patch python2.7-pgo-v07.patch python3.6-pgo-v07.patch issue24915-python2.7.diff: Unified patch for Python 2.7 pgo.py pgofix-cpython2.patch pgofix-cpython3.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = 'https://github.com/brettcannon' closed_at = created_at = labels = ['build', 'performance'] title = 'Profile Guided Optimization improvements (better training, llvm support, etc)' updated_at = user = 'https://bugs.python.org/alecsandrupatrascu' ``` bugs.python.org fields: ```python activity = actor = 'brett.cannon' assignee = 'brett.cannon' closed = True closed_date = closer = 'brett.cannon' components = ['Build'] creation = creator = 'alecsandru.patrascu' dependencies = ['25188'] files = ['40231', '40232', '40236', '40237', '40238', '40239', '40254', '40255', '40267', '40268', '40273', '40274', '40288', '40289', '40297', '40299', '40300', '40389', '40390', '40391', '40392', '40441', '40522', '41916', '41917'] hgrepos = [] issue_num = 24915 keywords = ['patch'] message_count = 63.0 messages = ['248988', '248992', '249002', '249004', '249006', '249008', '249013', '249014', '249052', '249053', '249055', '249061', '249071', '249128', '249131', '249143', '249146', '249155', '249200', '249227', '249246', '249256', '249286', '249315', '249333', '249336', '249355', '250008', '250011', '250093', '250095', '250096', '250099', '250100', '250102', '250105', '250489', '251033', '251034', '251035', '251036', '251065', '251090', '251091', '251096', '251110', '251112', '251125', '251126', '251127', '251128', '251129', '252179', '252182', '252184', '259840', '259868', '259870', '259878', '259881', '259883', '260269', '260573'] nosy_count = 11.0 nosy_names = ['skip.montanaro', 'brett.cannon', 'gregory.p.smith', 'tzot', 'pitrou', 'scoder', 'vstinner', 'larry', 'skrah', 'python-dev', 'alecsandru.patrascu'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'performance' url = 'https://bugs.python.org/issue24915' versions = ['Python 2.7', 'Python 3.5', 'Python 3.6'] ```

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO as the default build option for Python (both 2.7 and 3.6), given its performance benefits on a wide variety of workloads and hardware. For instance, as shown from attached sample performance results from the Grand Unified Python Benchmark, >20% speed up was observed. In addition, we are seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the codes are in Python 2.7. Our analysis indicates the performance gain was mainly due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a new one called "disable-profile-opt". We built and tested this patch for Python 2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon Haswell/Broadwell with 18/8 cores). We use "regrtest" suite for training as it provides the best performance improvement. Some of the test programs in the suite may fail which leads to build fail. One solution is to disable the specific failed test using the "-x " flag (as shown in the patch)

Steps to apply the patch:

hg clone https://hg.python.org/cpython cpython
cd cpython
hg update 2.7 (needed for 2.7 only)
Copy *.patch to the current directory
patch \< python2.7-pgo.patch (or patch \< python3.6-pgo.patch)
./configure
make

To disable PGO 7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON machine, XEON Broadwell EP.
Hardware (HW): Intel XEON (Broadwell) 8 Cores

BIOS settings: Intel Turbo Boost Technology: false Hyper-Threading: false

Operating System: Ubuntu 14.04.3 LTS trusty

OS configuration: CPU freq set at fixed: 2.6GHz by echo 2600000 > /sys/devices/system/cpu/cpu/cpufreq/scaling_min_freq echo 2600000 > /sys/devices/system/cpu/cpu/cpufreq/scaling_max_freq Address Space Layout Randomization (ASLR) disabled (to reduce run to run variation) by echo 0 > /proc/sys/kernel/randomize_va_space

GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1\~14.04)

Benchmark: Grand Unified Python Benchmark (GUPB) GUPB Source: https://hg.python.org/benchmarks/

Python2.7 results: Python source: hg clone https://hg.python.org/cpython cpython Python Source: hg update 2.7 hg id: 0511b1165bb6 (2.7) hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10 hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

    Benchmarks          Speedup(%)
    simple_logging      20
    raytrace            20
    silent_logging      19
    richards            19
    chaos               16
    formatted_logging   16
    json_dump           15
    hexiom2             13
    pidigits            12
    slowunpickle        12
    django_v2           12
    unpack_sequence     11
    float               11
    mako                11
    slowpickle          11
    fastpickle          11
    django              11
    go                  10
    json_dump_v2        10
    pathlib             10
    regex_compile       10
    pybench             9.9
    etree_process       9
    regex_v8            8
    bzr_startup         8
    2to3                8
    slowspitfire        8
    telco               8
    pickle_list         8
    fannkuch            8
    etree_iterparse     8
    nqueens             8
    mako_v2             8
    etree_generate      8
    call_method_slots   7
    html5lib_warmup     7
    html5lib            7
    nbody               7
    spectral_norm       7
    spambayes           7
    fastunpickle        6
    meteor_contest      6
    chameleon           6
    rietveld            6
    tornado_http        5
    unpickle_list       5
    pickle_dict         4
    regex_effbot        3
    normal_startup      3
    startup_nosite      3
    etree_parse         2
    call_method_unknown 2
    call_simple         1
    json_load           1
    call_method         1

Python3.6 results Python source: hg clone https://hg.python.org/cpython cpython
hg id: 96d016f78726 tip hg id -r 'ancestors(.) and tag()': 1a58b1227501 (3.5) v3.5.0rc1 hg --debug id -i: 96d016f78726afbf66d396f084b291ea43792af1

    Benchmark           Speedup(%)
    fastunpickle        22.94
    fastpickle          21.67
    json_load           17.64
    simple_logging      17.49
    meteor_contest      16.67
    formatted_logging   15.33
    etree_process       14.61
    raytrace            13.57
    etree_generate      13.56
    chaos               12.09
    hexiom2             12
    nbody               11.88
    json_dump_v2        11.24
    richards            11.02
    nqueens             10.96
    fannkuch            10.79
    go                  10.77
    float               10.26
    regex_compile       9.8
    silent_logging      9.63
    pidigits            9.58
    etree_iterparse     9.48
    2to3                8.44
    regex_v8            8.09
    regex_effbot        7.88
    call_simple         7.63
    tornado_http        7.38
    etree_parse         4.92
    spectral_norm       4.72
    normal_startup      4.39
    telco               3.88
    startup_nosite      3.7
    call_method         3.63
    unpack_sequence     3.6
    call_method_slots   2.91
    call_method_unknown 2.59
    iterative_count     0.45
    threaded_count      -2.79

Thank you, Alecsandru

scoder commented 9 years ago

Please upload your patches as separate, uncompressed files for review.

PGO was already proposed here, but nothing came out of it:

https://bugs.python.org/issue17781

I suggest rejecting that old ticket and sticking with this one since it has an actual patch.

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

I added the patches as individual files and removed the zip file.

smontanaro commented 9 years ago

Is this supposed to work on Macs using Apple's version of gcc? I've got the latest version of Yosemite and XCode, and am getting these warnings when trying to build 2.7:

clang: warning: argument unused during compilation: '-fprofile-generate'

Should this be enabled using a configure check? Perhaps gcc/clang supports this but spells the feature differently. gcc --help tells me:

% gcc --help | egrep profile -fprofile-instr-generate -fprofile-instr-use=\<value> Use instrumentation data for profile-guided optimization -fprofile-sample-use=\<value> Enable sample-based profile guided optimizations

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

The patches are tested on Linux machines, with GNU GCC >4.8.3. From your output I see that you are using the CLANG compiler. CLANG uses a different set of flags for PGO that are not compatible with GCC's, therefore the compilation will fail. Can you please use the GNU GCC compiler to test the patches?

smontanaro commented 9 years ago

It is executed using the gcc command:

% gcc -c -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fprofile-generate -I. -IInclude -I./Include -DPy_BUILD_CORE -o Modules/gcmodule.o Modules/gcmodule.c clang: warning: argument unused during compilation: '-fprofile-generate' % type gcc gcc is /usr/bin/gcc

I have no idea if you can even use something other than clang on Macs now. In any case, the default compiler should work to build Python out of the box, if necessary, by checking things during configure.

brettcannon commented 9 years ago

I did an initial code review on the 3.6 patch.

What would it take to add clang support for PGO? Is it simply using different flags that configure can set in the generated Makefile? Or is it more involved and would require maintaining two separate compile lines in the Makefile?

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

I received the review and will post new patch versions as soon as I update them.

Regarding PGO on clang, I will need a bit more time to edit the Makefile and will post it just for clang, to be easier for us to see the differences.

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

I modified the patches after the review made by Brett (python2.7-pgo-v02.patch and python3.6-pgo-v02.patch):

removed the call to pybench
left the PGO steps as optional. To use it we run "make profile-opt"
in the initial patches, I left out test_hashlib because in our benchmarks we did not see any gain by applying PGO to the hash functions. It is not harmful and we can let it there or skip it. Nevertheless, in order not to create confusions about it, I removed that parameter from the patch.

I also added the patches for Mac exclusively (python2.7-pgo-v02-mac.patch and python3.6-pgo-v02-mac.patch). You must have llvm-profdata installed and in your path (in /Library/Developer/CommandLineTools/usr/bin/) to use it. I tested on Yosemite and it compiles fine with clang. I am working on a generic version (configure and Makefile patches) to merge all these platforms and will post them as soon as it is done.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 9 years ago

My initial reaction is that the default should be optimized for short build times. I would not want to type "disable-profile-opt" every time I'm running the tests.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 9 years ago

I see that your latest patch leaves PGO as an option, so please ignore my previous comment.

brettcannon commented 9 years ago

the v02 patches LGTM. I'm fine with seeing those committed as-is knowing Alecsandru is actively working towards Clang support.

gpshead commented 9 years ago

i'm updating the title to be more accurate.

turning it on by default is likely not desirable as the makefile is primarily used by developers who are iterating on changes.

but having it use a good workload (regrtest) and work with llvm and os x are good. :)

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

I modified the patches to be compatible with both environments. The new versions modify the configure.ac file also, therefore you will need to run "autoconf" by hand. Also, in case of MaOS you will need to have llvm-profdata installed and in your path.

I kept the expanded form of regrtest (/Lib/test/regrtest.py) because this way it is clearer to the user what is the main file that runs the training workload.

Also, the "|| true" is necessary also, due to the nature of regrtest. This test suite is designed to return a fail code if a test is not ok, even for tests that do not comply with certain dependencies (meaning users that didn't installed any other libraries).

brettcannon commented 9 years ago

Any specific reason the v3 patch, Alecsandru, is listed as against 3.5 in the filename? Or is that just a typo?

P.S.: I did another review asking about explicit Clang support and also supporting Greg's request to use -m test instead of the explicit file execution.

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

Sorry, it was a typo. I made a correction to it. I will also modify to -m flag, instead of the explicit file execution.

Regarding the clang/gcc support, in v03 version of patches, GCC is supported. On Linux is straightforward. On Mac I see that the default development environment also has the "gcc" command, but it is a binary stub that calls clang in backend, so the flags are adjusted for clang-in-gcc-clothing. You say to support clang explicitly as a compiler in 2.7 and 3.6?

brettcannon commented 9 years ago

I'm asking if that's possible. For instance I set $CC to clang explicitly on OS X as I install the latest version of LLVM through Homebrew to get better compiler warnings for Python. It would be great if we could avoid leaving all clang users out unless they happen to use the stock install on OS X (e.g. cover Clang users on Linux).

Basically it would be nice if this is not exclusive to gcc if Clang also supports PGO.

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

Thank you for the clarifications! Your point make sense, we don't want to exclude clang environments. I will analyze this and post some patches once I'm done with it.

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

I modified the patches with clang support.

Also, I added an important check for the architecture on which PGO is running. Our proposal targets x86 platforms, since our measurements are made only on x86 hardware.

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

I fixed the files after the review. Regarding the PROFILE_OPT_OLD line, I think that it is better to keep also the old task used for PGO, until clear evidence and measurements that regrtest is performing better on other architectures exists.

pitrou commented 9 years ago

Can you explain what the profile-merging thing is achieving?

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

The profile merging is necessary in case you want to use a pure clang compiler or you use GCC in OSX. For example, a general profiling action using clang will result in at least one binary profile. For our case, when using regrtest, we will have multiple profiles as the test is a multi-process one. The application llvm-profdata has the ability to merge the information collected from multiple processes, thus having a more precise map of what is executed from the profiled application.

This step is mandatory even if we train on a single threaded or single process workload and have just one profile. More information about the entire process can be found here: http://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation

brettcannon commented 9 years ago

I did another round of review. I noticed that the configure part of the patch is missing and that .hgignore and .gitignore should get updated to ignore the profile files. Otherwise the only other comment was making an echoed comment a bit clearer.

And in case anyone else is on OS X Yosemite and gets an error about llvm-profdata missing, make sure that /Library/Developer/CommandLineTools/usr/bin is on your $PATH.

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

I've updated the patches after review and implemented the checkup for llvm-profdata for both Linux and OSX.

smontanaro commented 9 years ago

The latest patch worked fine for me (Mac OS X Yosemite). I've only tried with 2.7 so far. The only thing that was a bit mystifying were the errors during the initial profile run. There is so much that floats by in the terminal window that I completely missed the warnings about errors during the test run not being anything to worry about. I only noticed the messages when I took a look at the patch more closely.

Perhaps it would be worthwhile to add a short bit about the profile-opt target and its requirements to the README file.

smontanaro commented 9 years ago

Not knowing a darn thing about this, I went ahead and made a provisional change to the README file.

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

That's a good point Skip. I added another set of patches, just for the README files, explaining the entire procedure, so now anyone reading it will see that PGO is available, what are the steps involved and a brief comment about the warning.

scoder commented 9 years ago

The only thing that was a bit mystifying were the errors during the initial profile run. There is so much that floats by in the terminal window that I completely missed the warnings about errors during the test run not being anything to worry about.

Then wouldn't it be better to suppress (or at least reduce) the output of the test runs in this case?

brettcannon commented 9 years ago

I guess the test output -- both stdout and stderr -- could be redirected to /dev/null as simply using -q with regrtest will still lead to failures being emitted and random output which no one cares about except people inspecting the test output. Just need to make sure to mention that all output is suppressed so people don't think the process is hanging.

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

I've updated the patches with redirect to /dev/null, as is it is more clearer to the user what is our intent, without having him to necessarily read the regrtest documentation. I've also added a warning message regarding the output and ported all these lines to 3.6 and to the README files also.

pitrou commented 9 years ago

Please don't call it "PROFILE_TASK_X86" - the architecture should have nothing to do with it. Actually, there shouldn't be any architecture-specific check at all.

pitrou commented 9 years ago

As for the dual 2.7/3.6 aspect: I don't really understand it. If this is committed to 2.7 it should also be committed to 3.5. It doesn't threaten the stability of the interpreter in any way, given it does not affect the default build path. There's no reason why packagers of Python 3.5 should have to separately maintain a patch to have access to this improvement.

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

I named this task PROFILE_TASK_X86 because it is rigorously tested and we have proven that regrtest performs better on this architecture. Until any other clear evidence and solid measurements that regrtest is performing better on other architectures exists, I'd keep it this way.

Even though this does not threaten the stability of the interpreter in any way, the dual aspect you mentioned appears because CPython 2 and 3 have slightly different makefile rule format. To create a common patch working cross-versions will create a very tangled Makefile. If you all agree that having an unified patch for both versions is acceptable, I will work on that.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 9 years ago

I don't think we should provide any performance guarantees in the Makefile. +1 for not special-casing x86 (does it include amd64?).

As I understood, Antoine was not talking about a unified patch but about applying the 3.6 patch to 3.5 right away.

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

If you are talking just about the 3.6 patch, it is called this way to emphasize the fact that it is intended for the development branch. It is perfectly compatible with 3.5, therefore it is not needed for packagers to maintain two distinct versions. I've tested with: hg update 3.5 ; hg import --no-commit python3.6-pgo-v07.patch ; ./configure ; make profile-opt

I also renamed the profile task makefile name.

5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 9 years ago

Just (hopefully) for extra clarity: As you mentioned, the 3.6 patch is perfect for 3.5, too. The reason why 3.5 was brought up is to ask Larry, our release manager, to allow it already for 3.5.

Technically it's an enhancement/new feature, but practically it is zero risk and for PR reassons we should probably not "make 2.7 faster" before 3.5.

brettcannon commented 9 years ago

Attached is what I plan to commit to Python 2.7 assuming everyone is happy with the outcome. I tweaked the echoed messages from Alecsandru's patch, pulled in the README changes, and dropped the x86 checks as Antoine and Stefan requested.

Assuming people are happy with the patch I will also apply it to Python 3.5 with the appropriate tweaks.

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 9 years ago

New changeset 0f4e6c303531 by Brett Cannon in branch '2.7': Issue bpo-24915: Make PGO builds support Clang and use the test suite for https://hg.python.org/cpython/rev/0f4e6c303531

New changeset f211c8f554f9 by Brett Cannon in branch '2.7': Give proper credit for issue bpo-24915 https://hg.python.org/cpython/rev/f211c8f554f9

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 9 years ago

New changeset 7fcff838d09e by Brett Cannon in branch '3.5': Issue bpo-24915: Add Clang support to PGO builds and use the test suite https://hg.python.org/cpython/rev/7fcff838d09e

New changeset 7749fc0a5ea6 by Brett Cannon in branch 'default': Merge for issue bpo-24915 https://hg.python.org/cpython/rev/7749fc0a5ea6

brettcannon commented 9 years ago

Thanks to Alecsandru and Intel for the patches!

pitrou commented 9 years ago

Thank you Brett for committing this.

vstinner commented 9 years ago

Hum, the change 7fcff838d09e broke the buildbot "AMD64 Debian PGO 3.5". It would nice to add Clang support without loosing GCC support :-D

http://buildbot.python.org/all/builders/AMD64%20Debian%20PGO%203.5/builds/274

brettcannon commented 9 years ago

It didn't break gcc, the buildbot simply wasn't patient enough for the PGO run of the test suite to complete: http://buildbot.python.org/all/builders/AMD64%20Debian%20PGO%203.5/builds/274/steps/compile/logs/stdio . It takes a good amount of time to run the test suite serially with an instrumented interpreter and 20 minutes is not enough time. And I don't want to add output back simply to appease the buildbot as the output means nothing to a user who is doing the build themselves.

So either that buildbot needs to allow for a longer time without output, someone needs to come up with a way to simply emit some output that simply shows stuff is running (but without letting error condition stuff show up), or the buildbot just won't work with PGO.

pitrou commented 9 years ago

Le 19/09/2015 20:18, Brett Cannon a écrit :

And I don't want to add output back simply to appease the buildbot as the output means nothing to a user who is doing the build themselves.

The output is actually a good indication of progress, so I don't think it's not as silly to add it back as you seem to think it is :-)

brettcannon commented 9 years ago

The problem with the output is that error cases are unimportant and yet it fooled Skip into temporarily caring until he finally noticed the warning message. So my worry is that someone doesn't notice the "NOTE: ignore errors as they don't affect anything" and then glances at the output to notice an error and then worries that their PGO run failed.

It people really want to add output back in, though, they will need to patch both the Makefile to have a big NOTE in it as well as the README to say that any errors during the test suite run are unimportant and do not affect the outcome of the profile-guided optimizations.

smontanaro commented 9 years ago

Would it be possible to grep out the warning messages, but let everything else through? On Sep 19, 2015 1:34 PM, "Brett Cannon" \report@bugs.python.org\ wrote:

Brett Cannon added the comment:

The problem with the output is that error cases are unimportant and yet it fooled Skip into temporarily caring until he finally noticed the warning message. So my worry is that someone doesn't notice the "NOTE: ignore errors as they don't affect anything" and then glances at the output to notice an error and then worries that their PGO run failed.

It people really want to add output back in, though, they will need to patch both the Makefile to have a big NOTE in it as well as the README to say that any errors during the test suite run are unimportant and do not affect the outcome of the profile-guided optimizations.

----------

Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue24915\

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

Thank you for upstreaming this in both branches of Python! Do you think that a different version of regrtest.py, that will be used only for PGO training, should be better in this case? I mean, by implementing a custom version, I think we can control better the output and errors shown on screen.

brettcannon commented 9 years ago

I gave the custom test runner a try using unittest's discovery facility, but it started to execute the whole test suite again, so it's a bit more complicated than you might think (I guess it imported regrtest or something?).

pitrou commented 9 years ago

Instead of writing a custom test runner from scratch, I would suggest adding a hidden --option to regrtest that would disable reporting errors.

ae6878ec-d873-4297-8b64-d05d471e780d commented 9 years ago

I can work on modifying the existing regrtest and adding a distinct flag, --pgo for example, as Antoine suggested. Indeed, it will not be trivial as regrtest has a dual approach (single process and multi process), but I will give it a try and post a patch as soon as possible.

I also suggest that I open a new issue for this case as it is somehow a distinct implementation than pure PGO and definitively will be some iterations on regrtest.py for both versions of Python until we reach a common ground. It is ok for everyone?

python / cpython

Profile Guided Optimization improvements (better training, llvm support, etc) #69103