jeertmans / manim-slides

Tool for live presentations using manim
https://manim-slides.eertmans.be
MIT License
410 stars 44 forks source link

feat(lib): smarter files reversing #439

Open jeertmans opened 1 month ago

jeertmans commented 1 month ago

Implement a smarter generation of reversed files by splitting the video into smaller segments.

Closes #434

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.

Project coverage is 79.47%. Comparing base (964de66) to head (582f43f).

Files Patch % Lines
manim_slides/utils.py 96.42% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #439 +/- ## ========================================== + Coverage 79.24% 79.47% +0.23% ========================================== Files 22 22 Lines 1816 1842 +26 ========================================== + Hits 1439 1464 +25 - Misses 377 378 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jungerm2 commented 1 month ago

I have to hand it to you, you are very efficient! I didn't think this would get addressed so fast, so thanks a ton!

A few follow-up questions:

To be clear, I think this PR more than covers the initial bug, but the above might be worth considering.

jeertmans commented 1 month ago

Thanks for your comment! This is PR is not ready yet (IMO there is room for improvement)

I have to hand it to you, you are very efficient! I didn't think this would get addressed so fast, so thanks a ton!

A few follow-up questions:

  • The max_segment_duration argument as written cannot easily be changed by the user, so, while I think 1 second is pretty conservative and likely to work well in most cases, an user with a potato machine might want to lower it, and a user with a powerful workstation may want to increase it to render faster. Maybe this can be set the same way wait_time_between_slides is set, as an instance variable?

I can increase or reduce it, but I did not notice any noticeable change. Reversing files seems pretty slow in general...

But indeed, my plan is to provide configuration of all those parameters through class config. I need to rework this, see #441.

  • I'm a bit concerned about the multiprocessing part. Presumably, ffmpeg under the hood is multithreaded already, so does this give a noticeable boost? Alternatively, a user might want to limit the number of processes so that it doesn't big down their system...

FFmpeg is maybe multithreaded, but PyAV isn't (at least to my knowledge, by default, or to some extent, see https://pyav.org/docs/develop/cookbook/basics.html#threading). Using all my processes=None results in using all my CPUs are 100%, where processes=1 does nothing. In opposition, more threads take more memory, hence the trade-off between the segment time and the number of processes.

image

image

Still, the performance changes are not great (something like x2 to x4 for 16 threads), and I am open for help if you want to try finding a better combo :-)

Using thread_type = "AUTO" did not seem to affect performances. Maybe the operations are also I/O bounded, I don't know.

jungerm2 commented 1 month ago

I think you are right, this is likely IO bound anyways... However, I know that ffmpeg threads are low priority and designed to not bogg down a system (i.e: they show up in purple in htop), I don't think python multiprocessing acts like this so having this spin up a process per core might be problematic in that sense.

Ultimately, there's no right way to do this, having options via #441 seems like the best path forward.

jeertmans commented 1 month ago

I think you are right, this is likely IO bound anyways... However, I know that ffmpeg threads are low priority and designed to not bogg down a system (i.e: they show up in purple in htop), I don't think python multiprocessing acts like this so having this spin up a process per core might be problematic in that sense.

Ultimately, there's no right way to do this, having options via #441 seems like the best path forward.

I'll put this PR on hold until I can implement #441. Maybe Python's multiprocessing isn't the right solution, or the most beneficial one (the speed-up is very low vs the number of processes used).