python / cpython

The Python programming language
https://www.python.org
Other
63.32k stars 30.31k forks source link

Support for weak references in frames #102960

Open ambv opened 1 year ago

ambv commented 1 year ago

It's occasionally useful to be able to link information to active frames. Two recent examples:

Currently, since frames don't support being weakly referenced, custom lifecycle management is required in this case. That's unfortunate because errors in cleaning out the references will lead to memory leaks of dead frames and their respective locals.

The straightforward solution is to add weakref support to frame objects. The performance hit is limited to setting NULL on the f_weakreflist field on frame object creation, and later a NULL check (for whether there are any weakrefs) at the frame object's deallocation. This is what the related PR does.

I ran the PR and the base main branch at the time against pyperformance 1.0.6. The raw results are in benchmarks.zip but the gist is that there are no significant differences between the two branches in any of the benchmarks executed.

Linked PRs

markshannon commented 1 year ago

Do you have a link to the results in a legible form?

ambv commented 1 year ago

I'll get that for you tomorrow. In the mean time you can:

$ pip install pyperformance
$ unzip benchmarks.zip
$ pyperformance compare py312*
ambv commented 1 year ago
py312-4075fe1.json
==================

Performance version: 1.0.6
Report on macOS-12.6.3-arm64-arm-64bit
Number of logical CPUs: 10
Start date: 2023-03-23 16:41:05.587781
End date: 2023-03-23 17:54:09.388857

py312+weakref.json
==================

Performance version: 1.0.6
Report on macOS-12.6.3-arm64-arm-64bit
Number of logical CPUs: 10
Start date: 2023-03-23 15:18:16.273352
End date: 2023-03-23 16:31:20.587873

### 2to3 ###
Mean +- std dev: 700 ms +- 1 ms -> 691 ms +- 4 ms: 1.01x faster
Not significant

### async_generators ###
Mean +- std dev: 1.41 sec +- 0.00 sec -> 1.39 sec +- 0.01 sec: 1.02x faster
Not significant

### async_tree_cpu_io_mixed ###
Mean +- std dev: 1.80 sec +- 0.01 sec -> 1.79 sec +- 0.01 sec: 1.00x faster
Not significant

### async_tree_io ###
Mean +- std dev: 2.42 sec +- 0.03 sec -> 2.41 sec +- 0.03 sec: 1.00x faster
Not significant

### async_tree_memoization ###
Mean +- std dev: 1.27 sec +- 0.01 sec -> 1.27 sec +- 0.01 sec: 1.00x faster
Not significant

### async_tree_none ###
Mean +- std dev: 1.09 sec +- 0.01 sec -> 1.09 sec +- 0.01 sec: 1.00x faster
Not significant

### bench_mp_pool ###
Mean +- std dev: 92.8 ms +- 0.3 ms -> 92.6 ms +- 0.3 ms: 1.00x faster
Not significant

### bench_thread_pool ###
Mean +- std dev: 1.54 ms +- 0.00 ms -> 1.53 ms +- 0.00 ms: 1.00x faster
Not significant

### chameleon ###
Mean +- std dev: 22.6 ms +- 0.2 ms -> 22.6 ms +- 0.2 ms: 1.00x faster
Not significant

### chaos ###
Mean +- std dev: 253 ms +- 1 ms -> 252 ms +- 1 ms: 1.00x faster
Not significant

### coroutines ###
Mean +- std dev: 97.1 ms +- 0.2 ms -> 96.2 ms +- 0.3 ms: 1.01x faster
Not significant

### coverage ###
Mean +- std dev: 267 ms +- 2 ms -> 267 ms +- 3 ms: 1.00x faster
Not significant

### crypto_pyaes ###
Mean +- std dev: 266 ms +- 1 ms -> 265 ms +- 1 ms: 1.00x faster
Not significant

### deepcopy ###
Mean +- std dev: 1.17 ms +- 0.01 ms -> 1.16 ms +- 0.00 ms: 1.00x faster
Not significant

### deepcopy_memo ###
Mean +- std dev: 124 us +- 1 us -> 124 us +- 0 us: 1.00x slower
Not significant

### deepcopy_reduce ###
Mean +- std dev: 10.6 us +- 0.0 us -> 10.6 us +- 0.0 us: 1.00x faster
Not significant

### deltablue ###
Mean +- std dev: 12.0 ms +- 0.1 ms -> 12.1 ms +- 0.1 ms: 1.01x slower
Not significant

### docutils ###
Mean +- std dev: 5.77 sec +- 0.05 sec -> 5.81 sec +- 0.02 sec: 1.01x slower
Not significant

### dulwich_log ###
Mean +- std dev: 104 ms +- 0 ms -> 104 ms +- 0 ms: 1.00x slower
Not significant

### fannkuch ###
Mean +- std dev: 1.40 sec +- 0.00 sec -> 1.39 sec +- 0.01 sec: 1.01x faster
Not significant

### float ###
Mean +- std dev: 253 ms +- 1 ms -> 253 ms +- 2 ms: 1.00x slower
Not significant

### generators ###
Mean +- std dev: 101 ms +- 1 ms -> 101 ms +- 1 ms: 1.00x slower
Not significant

### genshi_text ###
Mean +- std dev: 73.6 ms +- 0.5 ms -> 73.7 ms +- 0.8 ms: 1.00x slower
Not significant

### genshi_xml ###
Mean +- std dev: 153 ms +- 1 ms -> 152 ms +- 1 ms: 1.00x faster
Not significant

### go ###
Mean +- std dev: 455 ms +- 3 ms -> 453 ms +- 2 ms: 1.00x faster
Not significant

### hexiom ###
Mean +- std dev: 21.7 ms +- 0.0 ms -> 21.7 ms +- 0.0 ms: 1.00x slower
Not significant

### html5lib ###
Mean +- std dev: 135 ms +- 0 ms -> 135 ms +- 0 ms: 1.00x faster
Not significant

### json_dumps ###
Mean +- std dev: 31.6 ms +- 0.1 ms -> 31.5 ms +- 0.1 ms: 1.00x faster
Not significant

### json_loads ###
Mean +- std dev: 83.7 us +- 0.7 us -> 83.3 us +- 0.8 us: 1.00x faster
Not significant

### logging_format ###
Mean +- std dev: 19.9 us +- 0.1 us -> 19.8 us +- 0.1 us: 1.01x faster
Not significant

### logging_silent ###
Mean +- std dev: 344 ns +- 1 ns -> 343 ns +- 1 ns: 1.00x faster
Not significant

### logging_simple ###
Mean +- std dev: 18.6 us +- 0.1 us -> 18.4 us +- 0.1 us: 1.01x faster
Not significant

### mako ###
Mean +- std dev: 33.9 ms +- 0.1 ms -> 33.8 ms +- 0.1 ms: 1.00x faster
Not significant

### mdp ###
Mean +- std dev: 7.92 sec +- 0.03 sec -> 7.94 sec +- 0.17 sec: 1.00x slower
Not significant

### meteor_contest ###
Mean +- std dev: 280 ms +- 0 ms -> 280 ms +- 0 ms: 1.00x slower
Not significant

### nbody ###
Mean +- std dev: 337 ms +- 2 ms -> 337 ms +- 2 ms: 1.00x faster
Not significant

### nqueens ###
Mean +- std dev: 302 ms +- 1 ms -> 302 ms +- 1 ms: 1.00x faster
Not significant

### pathlib ###
Mean +- std dev: 47.2 ms +- 0.4 ms -> 47.0 ms +- 0.2 ms: 1.00x faster
Not significant

### pickle ###
Mean +- std dev: 30.6 us +- 0.3 us -> 30.7 us +- 0.3 us: 1.00x slower
Not significant

### pickle_dict ###
Mean +- std dev: 69.9 us +- 0.3 us -> 69.9 us +- 0.2 us: 1.00x faster
Not significant

### pickle_list ###
Mean +- std dev: 9.89 us +- 0.11 us -> 9.89 us +- 0.12 us: 1.00x faster
Not significant

### pickle_pure_python ###
Mean +- std dev: 914 us +- 2 us -> 911 us +- 4 us: 1.00x faster
Not significant

### pidigits ###
Mean +- std dev: 633 ms +- 4 ms -> 631 ms +- 3 ms: 1.00x faster
Not significant

### pprint_pformat ###
Mean +- std dev: 5.19 sec +- 0.02 sec -> 5.20 sec +- 0.02 sec: 1.00x slower
Not significant

### pprint_safe_repr ###
Mean +- std dev: 2.56 sec +- 0.01 sec -> 2.56 sec +- 0.01 sec: 1.00x faster
Not significant

### pyflate ###
Mean +- std dev: 1.46 sec +- 0.01 sec -> 1.46 sec +- 0.00 sec: 1.00x faster
Not significant

### python_startup ###
Mean +- std dev: 18.7 ms +- 0.1 ms -> 18.7 ms +- 0.0 ms: 1.00x slower
Not significant

### python_startup_no_site ###
Mean +- std dev: 14.8 ms +- 0.0 ms -> 14.9 ms +- 0.1 ms: 1.00x slower
Not significant

### raytrace ###
Mean +- std dev: 1.09 sec +- 0.00 sec -> 1.08 sec +- 0.00 sec: 1.00x faster
Not significant

### regex_compile ###
Mean +- std dev: 399 ms +- 1 ms -> 398 ms +- 1 ms: 1.00x faster
Not significant

### regex_dna ###
Mean +- std dev: 238 ms +- 0 ms -> 238 ms +- 0 ms: 1.00x faster
Not significant

### regex_effbot ###
Mean +- std dev: 6.61 ms +- 0.05 ms -> 6.59 ms +- 0.04 ms: 1.00x faster
Not significant

### regex_v8 ###
Mean +- std dev: 41.9 ms +- 0.1 ms -> 41.7 ms +- 0.1 ms: 1.00x faster
Not significant

### richards ###
Mean +- std dev: 146 ms +- 1 ms -> 144 ms +- 1 ms: 1.02x faster
Not significant

### scimark_fft ###
Mean +- std dev: 1.17 sec +- 0.02 sec -> 1.17 sec +- 0.01 sec: 1.01x faster
Not significant

### scimark_lu ###
Mean +- std dev: 374 ms +- 2 ms -> 372 ms +- 1 ms: 1.00x faster
Not significant

### scimark_monte_carlo ###
Mean +- std dev: 234 ms +- 2 ms -> 233 ms +- 1 ms: 1.00x faster
Not significant

### scimark_sor ###
Mean +- std dev: 434 ms +- 3 ms -> 437 ms +- 3 ms: 1.01x slower
Not significant

### scimark_sparse_mat_mult ###
Mean +- std dev: 17.1 ms +- 0.1 ms -> 17.2 ms +- 0.1 ms: 1.01x slower
Not significant

### spectral_norm ###
Mean +- std dev: 428 ms +- 2 ms -> 430 ms +- 1 ms: 1.01x slower
Not significant

### sqlglot_normalize ###
Mean +- std dev: 388 ms +- 1 ms -> 388 ms +- 1 ms: 1.00x faster
Not significant

### sqlglot_optimize ###
Mean +- std dev: 181 ms +- 0 ms -> 181 ms +- 1 ms: 1.00x slower
Not significant

### sqlglot_parse ###
Mean +- std dev: 4.67 ms +- 0.02 ms -> 4.69 ms +- 0.03 ms: 1.00x slower
Not significant

### sqlglot_transpile ###
Mean +- std dev: 5.42 ms +- 0.03 ms -> 5.43 ms +- 0.03 ms: 1.00x slower
Not significant

### sqlite_synth ###
Mean +- std dev: 4.45 us +- 0.02 us -> 4.44 us +- 0.02 us: 1.00x faster
Not significant

### telco ###
Mean +- std dev: 19.5 ms +- 0.1 ms -> 19.4 ms +- 0.1 ms: 1.00x faster
Not significant

### tornado_http ###
Mean +- std dev: 196 ms +- 6 ms -> 192 ms +- 6 ms: 1.02x faster
Significant (t=3.52)

### unpack_sequence ###
Mean +- std dev: 161 ns +- 6 ns -> 161 ns +- 6 ns: 1.00x faster
Not significant

### unpickle ###
Mean +- std dev: 51.4 us +- 0.5 us -> 51.4 us +- 0.6 us: 1.00x slower
Not significant

### unpickle_list ###
Mean +- std dev: 17.1 us +- 0.1 us -> 17.0 us +- 0.0 us: 1.00x faster
Not significant

### unpickle_pure_python ###
Mean +- std dev: 700 us +- 1 us -> 708 us +- 1 us: 1.01x slower
Not significant

### xml_etree_generate ###
Mean +- std dev: 284 ms +- 2 ms -> 285 ms +- 2 ms: 1.00x slower
Not significant

### xml_etree_iterparse ###
Mean +- std dev: 299 ms +- 2 ms -> 299 ms +- 2 ms: 1.00x slower
Not significant

### xml_etree_parse ###
Mean +- std dev: 366 ms +- 2 ms -> 366 ms +- 2 ms: 1.00x slower
Not significant

### xml_etree_process ###
Mean +- std dev: 195 ms +- 1 ms -> 196 ms +- 1 ms: 1.00x slower
Not significant

The only "Significant" one is tornado_http and this one claims it's in favor of the weakreflist introduction so clearly noise.

glyph commented 1 year ago

There's some Deferred debugging stuff that has caused significant angst with hard references to frame objects. Weakrefs here would be hugely useful. I'm not sure exactly how we'd use them (the cycles and attendant leaks were such a problem that we have mostly given up on this sort of thing and I'd need to load the whole feature back into my head) but I definitely want this if it is a possibility!

tacaswell commented 1 year ago

We have an open issue on Matplotlib due to frame related reference cycles https://github.com/matplotlib/matplotlib/issues/25406

markshannon commented 1 year ago

This is not the way to fix memory leaks.

If you really must have cycles involving frames, then you can break the cycle like this: class SupportsWeakRefs: pass frame -> weakref() -> SupportsWeakRefs() -> frame.

Although I have to wonder why plt.savefig needs to do strange things to the stack.

tacaswell commented 1 year ago

plt.savefig is how the use found it but the cycles are from calls much deeper in the code (we were caching an Exception object 🐑 , and it looks like some versions of pillow and pyparsing also are doing something with cached exceptions). I do not think there is a compelling reason we (Matplotlib) needs this, but we were bit by frame cycles someplace.