Closed FredHappyface closed 7 months ago
How did you test the two libraries? pyrlottie is slower in Windows and Linux from my testing...
Testing using this file: https://github.com/laggykiller/rlottie-python/blob/master/example/sample.tgs
test.py
from rlottie_python import LottieAnimation
anim = LottieAnimation.from_tgs("sample.tgs")
anim.save_animation("test.gif")
test-pyrlottie.py
import pyrlottie
pyrlottie.run(pyrlottie.convSingleLottie(pyrlottie.LottieFile("sample.tgs"), ["test.gif"]))
Testing on Windows: Measure-Command { python .\test.py }
vs Measure-Command { python .\test-pyrlottie.py }
Testing on Linux: time python ./test.py
vs time python ./test-pyrlottie.py
It would be nice if you can provide the code and command you used for profiling both libraries. I recommend py-spy
for investigating culprit.
With rlottie-python, I'm using
concurrent.futures.ProcessPoolExecutor(max_workers=threads)
as the executor.
Are you saving frames in tgs to image files using multiprocessing? Perhaps this is slow because spawning a new python process is slow? Besides the act of spawning a new python executor for each process, each new python process would need to load a copy of the file it is converting and the rlottie library (Memory is not shared between python processes), which creates overhead? Maybe the task of saving one frame of the animation is too small to have benefit from multiprocessing (https://stackoverflow.com/questions/68892839/how-to-overcome-overhead-in-python-multiprocessing)? Also please experiment with different number of threads (number of threads = number of CPU cores might not yield the best result)
Also, from running py-spy record -o result.svg -- python test.py
I noticed rlottie-python
spent much of the time on saving the rendered frames with Pillow. Perhaps saving rendered frames with Pillow is slower than gif2webp
. Unfortunately rlottie library itself does not provide any functions that allow me to save frames to file on C side, and I am forced to use python Pillow for this task. I could rewrite the whole project to use another binding library (such as nanobind), then write binding functions that save rendered frames with C++ instead of using python, but this is too much work for me and the benefit is too small, plus you cannot manipulate the frames before saving with this method.
two libraries that do the same thing
User may want to manipulate the frames read from lottie file with python before saving. If user uses rlottie-python
, the user eventually need to run the 'slow' process of saving rendered frames using Pillow anyway. If user uses pyrlottie
, there is no way to read frames from lottie files directly, user has to first save a file of rendered frames (e.g. file.gif) and then reading that file with python, which is a great performance penalty. Hence, pyrlottie
is better when just converting lottie file to raster image (e.g. gif file), but worse for loading lottie file to python for manipulating frames.
Also, pyrlottie
uses lottie2gif
and gif is a lossy format, meaning it is not possible to get the lossless rendered frame using pyrlottie
.
more portable than pyrlottie
Speaking of portability, lottie
(https://pypi.org/project/lottie/) is the winner as it is pure python implementation (though this probably also mean slower), as well as having more functionality such as supporting many input/output formats and vector graphics, but the rendered frames are sometimes buggy. I really hope that project can fix the problem of rendering buggy frames...
btw1 I am curious with why macOS build of pyrlottie
is not available? lottie2gif
is an executable from compiling rlottie
, and gif2webp
already has macOS build. Is it because you don't have mac machine available? If this is the case, I could help you compile (Though this is not sustainable and you cannot trust me). However, the best way is to use github action for compiling rlottie
when building wheel (btw storing precompiled binary in git repo is bad idea, it is better to compile rlottie when building wheel).
btw2 execution bit is lost not just in WSL, but also Linux in general. Maybe update Readme about this? Or even better, check for permission and run chmod +x
in your library.
Hi so sorry it's taken so long to write a response. This one kinda fell down the back of the sofa on my todo list
So I've ran the following test
import concurrent.futures
import multiprocessing
import time
from typing import Callable
from pyrlottie import FileMap, LottieFile, convMultLottie, convSingleLottie, run
from rlottie_python import LottieAnimation
def rlottie_py():
anim = LottieAnimation.from_tgs("sample.tgs")
anim.save_animation("test.gif")
anim.lottie_animation_destroy()
def py_rlottie():
run(convSingleLottie(LottieFile("sample.tgs"), {"test.gif"}))
def rlottie_py_mult():
def convert_single_tgs():
anim = LottieAnimation.from_tgs("sample.tgs")
anim.save_animation("test.gif")
anim.lottie_animation_destroy()
with concurrent.futures.ThreadPoolExecutor(max_workers=multiprocessing.cpu_count()) as executor:
# Using list comprehension to submit tasks to the executor
future_to_variable = {executor.submit(convert_single_tgs) for x in range(100)}
# Wait for all tasks to complete and retrieve results
for future in concurrent.futures.as_completed(future_to_variable):
variable = future.result()
def py_rlottie_mult():
run(
convMultLottie(
filemaps=[
FileMap(
LottieFile("sample.tgs"),
{
"test.gif",
},
)
for x in range(100)
]
)
)
def timeit(fn: Callable):
start_time = time.time()
fn()
end_time = time.time()
print(f"{fn.__name__} time: {end_time - start_time} seconds")
timeit(rlottie_py)
timeit(py_rlottie)
timeit(rlottie_py_mult)
timeit(py_rlottie_mult)
The results over a few runs are:
poetry run py perf.py
rlottie_py time: 2.2334213256835938 seconds
py_rlottie time: 4.818990230560303 seconds
rlottie_py_mult time: 183.9050681591034 seconds
py_rlottie_mult time: 101.32033944129944 seconds
---
rlottie_py time: 3.886620283126831 seconds
py_rlottie time: 6.14373254776001 seconds
rlottie_py_mult time: 159.29676604270935 seconds
py_rlottie_mult time: 114.90005588531494 seconds
--
rlottie_py time: 2.1622235774993896 seconds
py_rlottie time: 4.655423879623413 seconds
rlottie_py_mult time: 150.51082158088684 seconds
py_rlottie_mult time: 116.90487170219421 seconds
Turns out the pickle library throws a tempter tantrum when I use the ProcessPoolExecutor
. I'll do some further experimentation with different number of threads as I agree that this could have a significant impact on the results
I noticed rlottie-python spent much of the time on [saving the rendered frames with Pillow]
My understanding was (perhaps mistakenly) that pillow was pretty optimised when it comes to reading and writing of images. I'll definitely do some more research into this too
User may want to manipulate the frames read from lottie file with python before saving.
Ultimately I feel that the rlottie-python
library provides many benefits over the pyrlottie
lib, so if I can get the performance closer for my use-case, and hopefully contribute that knowledge back here then that'd be awesome. Having said that, giving users options is never a bad thing so I'd certainly never just drop support for a lib without a good migration period
why macOS build of pyrlottie is not available
Ultimately this is as I do not have a mac and attempts at running cmake haven't been particularly successful so just been using the prebuit binaries. If you'd be happy to contribute these then that'd be enourmously appreciated
btw execution bit is lost not just in WSL, but also Linux in general.
Thanks for the heads up! I thought I'd squashed this bug!
Three problems of testing script:
rlottie_py
to executor instead of convert_single_tgs
rlottie_py
, then rlottie_py_mult
cause deadlock. I need to investigate further... EDIT: I was testing on Linux, fixed by multiprocessing.set_start_method('spawn')
This is better:
import concurrent.futures
import multiprocessing
import os
import time
import timeit # type: ignore
from typing import Any, Callable
from pyrlottie import FileMap, LottieFile, convMultLottie, convSingleLottie, run # type: ignore
from rlottie_python import LottieAnimation
os.makedirs("test_rlottie_python", exist_ok=True)
os.makedirs("test_pyrlottie", exist_ok=True)
def rlottie_py(fname: str = "test"):
anim = LottieAnimation.from_tgs("sample.tgs")
anim.save_animation(f"test_rlottie_python/{fname}.gif")
anim.lottie_animation_destroy()
def py_rlottie():
run(convSingleLottie(LottieFile("sample.tgs"), {"test_pyrlottie/test.gif"}))
def rlottie_py_mult():
with concurrent.futures.ProcessPoolExecutor(max_workers=int(multiprocessing.cpu_count())) as executor:
# Using list comprehension to submit tasks to the executor
future_to_variable = {executor.submit(rlottie_py, str(i)) for i in range(100)}
# Wait for all tasks to complete and retrieve results
for future in concurrent.futures.as_completed(future_to_variable):
variable = future.result()
def py_rlottie_mult():
run(
convMultLottie(
filemaps=[
FileMap(
LottieFile("sample.tgs"),
{
f"test_pyrlottie/{i}.gif",
},
)
for i in range(100)
]
)
)
def timeit(fn: Callable[..., Any]):
start_time = time.time()
fn()
end_time = time.time()
print(f"{fn.__name__} time: {end_time - start_time} seconds")
if __name__ == "__main__":
multiprocessing.set_start_method("spawn")
timeit(rlottie_py)
timeit(py_rlottie)
timeit(rlottie_py_mult)
timeit(py_rlottie_mult)
The result without modification:
# Arch Linux, 16 core Gen 8 Intel desktop
rlottie_py_mult time: 32.663668155670166 seconds
py_rlottie_mult time: 5.823666095733643 seconds
# Windows 11, 8 core Gen 8 Intel mobile
rlottie_py_mult time: 42.006444215774536 seconds
py_rlottie_mult time: 18.870269536972046 seconds
The result of using process instead of threads:
# Arch Linux, 16 core Gen 8 Intel desktop
rlottie_py_mult time: 11.285428285598755 seconds
py_rlottie_mult time: 5.824993133544922 seconds
# Windows 11, 8 core Gen 8 Intel mobile
rlottie_py_mult time: 25.225178003311157 seconds
py_rlottie_mult time: 19.338806629180908 seconds
rlottie-python
still lost, but much faster already.
Here is result of running py-spy record -o result.svg -- python test.py
(See my previous comment for content of test.py
)
Interactive version (Download and unzip for the svg, open it in browser): result.zip
As mentioned, Pillow takes long time to save gif.
What if we save with pyav?
import av
from av.video.stream import VideoStream
import numpy as np
...
def rlottie_py(fname: str = "test"):
anim = LottieAnimation.from_tgs("sample.tgs")
frames = anim.lottie_animation_get_totalframe()
fps = anim.lottie_animation_get_framerate()
width, height = anim.lottie_animation_get_size()
options = {
"loop": "0"
}
with av.open(f"test_rlottie_python/{fname}.gif", "w", format="gif") as output:
out_stream = output.add_stream("gif", rate=fps, options=options)
out_stream = cast(VideoStream, out_stream)
out_stream.pix_fmt = "rgb8"
for i in range(frames):
buffer = anim.lottie_animation_render(i)
frame = np.frombuffer(buffer, dtype=np.uint8).reshape((width, height, 4))
av_frame = av.VideoFrame.from_ndarray(frame, format="bgra")
output.mux(out_stream.encode(av_frame))
output.mux(out_stream.encode())
anim.lottie_animation_destroy()
The result:
# Arch Linux, 16 core Gen 8 Intel desktop
rlottie_py_mult time: 6.875451564788818 seconds
py_rlottie_mult time: 5.936105728149414 seconds
# Windows 11, 8 core Gen 8 Intel mobile
rlottie_py_mult time: 17.615680694580078 seconds
py_rlottie_mult time: 17.997378826141357 seconds
Now rlottie-python
is just off by 1 second on Arch Linux, and even winning for just a bit in Windows!
btw the time of using multiprocessing.cpu_count()
and int(multiprocessing.cpu_count() / 2)
are similar
Thank you so much for your help on this, with these. With these optimisations I'm struggling to see the place that pyrlottie has really, one option might be to provide a few helper methods for if the user wants to blindly convert from a source to destination format like tgs, to say webp
With a 512x512 source image
Using ProcessPoolExecutor
I got
rlottie_py time: 1.117741584777832 seconds
py_rlottie time: 4.003070592880249 seconds
rlottie_py_mult time: 52.18350124359131 seconds
py_rlottie_mult time: 202.2332363128662 seconds
Note: no idea why py_rlottie_mult time
was so terrible for this run. Possibly windows defender not really liking me spawning random exes?
Using ProcessPoolExecutor
and pyav I got
rlottie_py time: 0.986565351486206 seconds
py_rlottie time: 4.540838956832886 seconds
rlottie_py_mult time: 45.67705249786377 seconds
Just a note there was a minor bug with pyav, the out_stream.width
and out_stream.height
need setting explicitly it seems
with av.open(f"test_rlottie_python/{fname}.gif", "w", format="gif") as output:
out_stream = output.add_stream("gif", rate=fps, options=options)
out_stream = cast(VideoStream, out_stream)
out_stream.width = width
out_stream.height = height
Despite the better performance, I am not planning to add pyav related code to rlottie-python as:
Despite the performance, I think using Pillow to save animation is good enough in this project. Providing function for saving to file is a small bonus feature, save_animation()
aims to be a minimalistic function for saving animation which 'just works' and 'good enough' for most use cases without intervention. If user wants more performance / better quality / save in other file format, they are free to choose their own method of saving in their own code, or even just ditch python and use C/C++ interface of rlottie directly.
Makes perfect sense tbh. Plus I've just learned that pyav doesn't support webp which caught me out somewhat
Thanks for your time on this! 🙂
pyav doesn't support webp
It supports encoding webp but not decoding webp.
If you'd be happy to contribute these then that'd be enourmously appreciated
Opened a PR: https://github.com/FHPythonUtils/PyRlottie/pull/5
Since the performance issue is on Pillow, not on binding code to rlottie, I am closing this.
Hi, thank you very much for creating this library. Having something more portable than pyrlottie is awesome, as that's something I was kind of struggling with.
I've noticed that while there's a massive win in terms of cross-platform use, it seems to come at a slight performance cost of about 5%.
Giving myself a refresher on what pyrlottie does versus how I'm using rlottie-python:
I'm using
asyncio.create_subprocess_shell
withasyncio.Semaphore(multiprocessing.cpu_count())
to call the binaries directly.With rlottie-python, I'm using
concurrent.futures.ProcessPoolExecutor(max_workers=threads)
as the executor.To be honest, I'm not sure if this is a Windows issue. I'm curious if you've noticed any difference in performance between the libraries?
I think if we are in a place where performance is almost equal, then I'd like to start pointing people to this library and offer any dev support, rather than both of us maintaining two libraries that do the same thing.
I realize this is a bit of a brain dump, but I'd be really keen on hearing your thoughts. :)
Thank you