python-pillow / Pillow

Python Imaging Library (Fork)
https://python-pillow.org
Other
12.35k stars 2.24k forks source link

Use Capsule for WebP saving #8386

Closed homm closed 2 months ago

homm commented 2 months ago

Adoption of #8341 ideas

Improvements

Yay295 commented 2 months ago

You're essentially using _VALID_WEBP_MODES in two locations, but you removed that constant. I assume you inlined them when _VALID_WEBP_MODES and _VALID_WEBP_LEGACY_MODES were each only used in one place, but it seems like _VALID_WEBP_MODES could be kept. Also it could just be a Set instead of a Dict with True values.

homm commented 2 months ago

@Yay295 I've removed this constants primary since it's mistyping. Now I'm added separate _convert_frame routine for this.

radarhere commented 2 months ago

I expected these changes to improve speed, but I'm not finding a definitive difference when I test. Does that mirror your experience?

Are the improvements here more about theory and tidying up?

Yay295 commented 2 months ago

I don't know how much of an effect it has, but there is some overhead when using Capsules because they use strings as identifiers. So there's some added string comparisons behind the scenes.

homm commented 2 months ago

@radarhere

Are the improvements here more about theory and tidying up?

All improvements are listed in the PR description. While elimination of .tobytes() doesn't affect execution time much, it helps to avid extra image copy, which reduces memory footprint. See "Maximum resident set size" below. Also, ImagingSectionEnter/Leave in WebPAnimEncoder unlocks GIL in the current thread, which improves IO in other threads.

from io import BytesIO
from PIL import Image
for i in range(4):
    with BytesIO() as f:
        Image.new('RGB', (8192, 8192)).save(f, 'webp', method=0, quality=1)
# AMD Cpu

## Main branch
$ /usr/bin/time -v python test.py
  Command being timed: "python test.py"
  User time (seconds): 19.39
  System time (seconds): 1.16
  Percent of CPU this job got: 99%
  Elapsed (wall clock) time (h:mm:ss or m:ss): 0:20.63
  Maximum resident set size (kbytes): 1296972
  Major (requiring I/O) page faults: 18
  Minor (reclaiming a frame) page faults: 1484790
  Voluntary context switches: 19
  Involuntary context switches: 719

## webp-capsule branch
$ /usr/bin/time -v python test.py 
  Command being timed: "python test.py"
  User time (seconds): 19.46
  System time (seconds): 0.78
  Percent of CPU this job got: 99%
  Elapsed (wall clock) time (h:mm:ss or m:ss): 0:20.27
  Maximum resident set size (kbytes): 1100660
  Major (requiring I/O) page faults: 0
  Minor (reclaiming a frame) page faults: 1090899
  Voluntary context switches: 1
  Involuntary context switches: 1180

# Local Docker (M1 Pro)

## Main branch
$ /usr/bin/time -v python test.py
    Command being timed: "python test.py"
    User time (seconds): 4.51
    System time (seconds): 0.35
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.89
    Maximum resident set size (kbytes): 851160
    Major (requiring I/O) page faults: 9
    Minor (reclaiming a frame) page faults: 674262
    Voluntary context switches: 495
    Involuntary context switches: 88

## webp-capsule branch
$ /usr/bin/time -v python test.py
    Command being timed: "python test.py"
    User time (seconds): 4.18
    System time (seconds): 0.24
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.44
    Maximum resident set size (kbytes): 648432
    Major (requiring I/O) page faults: 9
    Minor (reclaiming a frame) page faults: 286806
    Voluntary context switches: 390
    Involuntary context switches: 10
homm commented 2 months ago

@Yay295

some overhead when using Capsules because they use strings as identifiers

It's literally nothing comparing to the copying megabytes of image data to the byte string. But as you can see above, even the copying doesn't affect encoding time much.