mltframework / mlt

MLT Multimedia Framework
https://www.mltframework.org
GNU Lesser General Public License v2.1
1.47k stars 313 forks source link

Huge memory consumption of mlt-7.14.0 runs oom with 08:40 minutes project on 64+8 GB laptop #899

Closed Yamakuzure closed 1 year ago

Yamakuzure commented 1 year ago

System : Gentoo Linux CPU : Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz GPU : Quadro T2000 RAM : 64GB + 8GB swap Driver : 515.105.01 CUDA : 11.7 Qt : 5.15.8 MLT : 7.14.0 Shotcut: 22.12.21 ffmpeg : 5.1.2

Main MLT file: (The project) playlists : 8 producers : 21 chains : 458 tractors : 5

Render MLT file: (To render the project as a track for additional filters) playlists : 3 producers : 3 chains : 0 tractors : 1


Symptoms in Shotcut-22.12.21 with MLT-7.14.0:

top reported at about 62%:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
14968 sed       23   3  100,8g  50,2g 722056 S  1088  80,5 134:36.87 melt-7

At that time shotcut had spawned 5,258 threads and melt-7 had spawned 1,436 threads


Behavior after downgrading to shotcut-22.09.23 with MLT-7.12.0:

top reported at about 62%:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
23266 sed       23   3   38,0g   9,0g 833456 S  1138  14,4 123:38.82 melt-7

At that time shotcut had spawned 714 threads and melt-7 had spawned 523 threads.


Yesterday I highjacked https://github.com/mltframework/shotcut/issues/1391 because I thought it was shotcut's fault due to the long loading times and sluggish behavior. But after I checked top it more likely seems to be an mlt issue.

I have not yet tried shotcut-22.12.21 with mlt-7.12.0, though.


Edit : I now have tested Shotcut-22.12.21 with MLT-7.12.0: (I made a mistake and had to render again anyway)

Loading time of both main and render project is around 4-5 seconds.

Thread counts at 60% of the rendering: shotcut: 952 melt-7: 523

melt-7 memory consumption:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
16757 sed       23   3   38.3g  9.2g 833996 S 976.5  14.0  48:56.84 melt-7

Sorry again for the noise in the shotcut issues section, this seems to be a pure mlt-7.14.0 matter.

ddennedy commented 1 year ago

It wold be nice if you can test version 7.14 without using the render MLT file. I suspect the handling around that. If that works, instead of using render file, you can make another test by copying the fillers from the render MLT track and paste them to Output > Filters in the main MLT.

ddennedy commented 1 year ago

Also, do the video modes (profile in XML) between these two projects match?

ddennedy commented 1 year ago

Today, I tested this using a medium size/complexity project both alone and as a sub-project, but I did not reproduce it. I was using MLT from git on April 3 as appears in the Shotcut 23.04 beta. There might be something in your project that triggers it, but the above requested tests will help maybe a little. Do you happen to be using much interlaced video sources?

@bmatherly Considering the lack of a global or shared thread pool for FFmpeg maybe we should not make the libavfilter-based deinterlacers the default. These links will not release their resources (slice threads) as producers get flushed from the MLT service cache because it is not the MLT service that actually gets flushed - rather only the private guts of them. Of course, there is little we can do about someone adding something like avfilter.lut3d to every clip in their project, but upon review most of the avfilter-based Shotcut filters are very unlikely to be used everywhere. OTOH, someone could very well end up including many short shots of interlaced video (e.g. removing all speaking pauses).

bmatherly commented 1 year ago

Here is a suggestion to handle all filters and links in a general way: We could keep a separate cache for filters and links to use for their private state. However, instead of using the same heuristic as producers (2 x number of tracks), we would have a heuristic to detect when the service is not being used. For example, the heuristic could be that the cache item is deleted if it has not been used for 1000 frames.

I am willing to implement this if you are interested. But maybe it would be too risky to add for this very next Shotcut release.

Yamakuzure commented 1 year ago

It wold be nice if you can test version 7.14 without using the render MLT file. I suspect the handling around that. If that works, instead of using render file, you can make another test by copying the fillers from the render MLT track and paste them to Output > Filters in the main MLT.

I will try this out, but I have done it like that for years, so...

I am using Deband, Reduce Noise: HQDN3D and 2-pass-filter. The latter is not available as an Output filter, which is the only reason I use a render project in the first place.

Also, do the video modes (profile in XML) between these two projects match?

They are both from the same template, yes. With template I mean an empty MLT-file I copy and rename.

Today, I tested this using a medium size/complexity project both alone and as a sub-project, but I did not reproduce it. I was using MLT from git on April 3 as appears in the Shotcut 23.04 beta. There might be something in your project that triggers it, but the above requested tests will help maybe a little. D

I have used mlt-7.14.0 built from the release tag. Maybe it is a bug that got already fixed? It might be more worthwhile to test a build from git instead to see if the issue persists.

Do you happen to be using much interlaced video sources?

I never use interlaced videos. All clips have the same format because that is what my video processing tool outputs. Stream #0:0: Video: h264 (High 4:4:4 Intra), yuvj444p(pc, bt709, progressive), 2560x1440 [SAR 1:1 DAR 16:9], 60 fps, 60 tbr, 1k tbn Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp

Please find attached the two MLT files: S03E06_PotC.zip

The resulting video is here: https://youtu.be/6Eq1HxuoV7s

Yamakuzure commented 1 year ago

Update: I have tried with MLT master commit 50930bbd7a9c0b13df82f1f92ace2ae48b206f8c :

The render project took ages to load, so I closed it and did not try again. Whatever causes this is not fixed.

However, as suggested above, I tried the regular project file. (I did not add filters to the output.)

So, here are the stats at 62%: (I had to kill melt-7 again, or it would have oom'd my laptop.)

Shotcut: 2,894 threads melt-7 : 2,698 threads

 ~ # top -b -n 1 | head -n 10
top - 17:17:50 up  8:45,  8 users,  load average: 4,05, 5,14, 3,44
Tasks: 588 total,   1 running, 587 sleeping,   0 stopped,   0 zombie
%CPU(s):  7,5 us,  0,7 sy, 25,1 ni, 63,7 id,  3,0 wa,  0,0 hi,  0,0 si,  0,0 st
MiB Spch:  63810,5 total,   2066,2 free,  59799,8 used,   1944,5 buff/cache
MiB Swap:  24177,0 total,  24177,0 free,      0,0 used.    671,9 avail Spch

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     ZEIT+ BEFEHL
 5147 sed       23   3   94,0g  44,6g 351092 S 500,0  71,6  39:36.15 melt-7

@ddennedy

It would be nice if you can test version 7.14 without using the render MLT file. I suspect the handling around that.

Loading is fixed, but rendering is not.

Maybe I can test melt-7 built with asan or tsan...

ddennedy commented 1 year ago

I was able to reproduce this by changing the project to use my media. The problem does not occur in our Shotcut release build 22.12.21, but it does in daily build 23.01.14. I tried reverting 453745be (requires manual revert), but that did not fix it. A lot merged in that time, but not the new avformat producer thread. @bmatherly Using git master HEAD, I commented out the line "resampler=swresample,resample" in $prefix/share/mlt-7/chain_normalizers.ini and the problem went away. Probably, the audiochannels link is not used, but some of the image links surely are. Also, choosing the resample link instead of swresample does not fix it.

bmatherly commented 1 year ago

That is very good isolation. It is possible that the bug is in both swresample and resample links because they both use the same frame caching technique. This caching is necessary because avformat is not sample accurate when seeking and it does not cache audio. (maybe we need to resurrect this PR https://github.com/mltframework/mlt/pull/744)

I tried for a while to reproduce the problem. But I have not managed to reproduce it yet. Tomorrow I will make a deeper inspection of the frame caching and maybe try mlt_frame_cache().

ddennedy commented 1 year ago

The "it" that I am mainly reproducing or monitoring is the thread count while running melt against a reduced version of the project where I stripped out the contents of the "main_bin" playlist and its producers/chains. I do not have enough memory on my 32 GB system to load the project into Shotcut - even versions from 22. To do that I am using commands similar to his comment in https://github.com/mltframework/shotcut/issues/1391 The melt command I am running is like

~/shotcut/Shotcut-22.12.21/Shotcut.app/melt -progress -consumer null terminate_on_pause=1 real_time=-4 '/home/ddennedy/net/media/unsorted/mlt_bug/899/S03E06_People_Of_The_Citadel - No Playlist.mlt'

where I vary the melt build and check the thread count at around 20% progress. I just made another test monitoring memory, and it was about 2x in the current version at this point. I do not think you need to use this project; just any project with >20 short chains ought to work, but you might need to run that longer than 20% to see it.

I am mainly trying to understand the thread count problem. It seems the main source of them would be coming from the avformat producer. So, it would seem somehow the frame_cache in these links is defeating the avformat producer cache. But I have not yet figured out how. The avformat cache is not elastic; it is managed by mlt_multitrack (see resize_service_caches()).

ddennedy commented 1 year ago

link_close() is not closing all the frames in its frame_cache, but this is never reached. It appears chains are not getting closed automatically as melt progresses even though I added autoclose="1" to all the <playlist in the XML. I think the chains are not closing because the frames inside frame_cache are holding references to them; so their reference count never reaches 0.

Yamakuzure commented 1 year ago

First run with thread sanitizer (only up to 5% rendering) resulted in 141 warnings. 139 data races (*) and 2 potential deadlocks. I will now do a second run with some deeper information options. After that I will do a quick lsan run, and if that does not yield anything, the bigger asan. (*) I had not used a suppression file, yet, so most will likely be Qt5Core and Qt5GUI.

Update : Most was libavcodec and libavutil. Only 2 data races and one lock order cycle are relevant. I'll fix them and create a PR.

Yamakuzure commented 1 year ago

I have tried Leak, Undefined and Address Sanitizer. The latter reported ~1.6 MB in direct and indirect leaks, but nothing really interesting. (I can add a log if you'd like to have a look)

Much more interesting is, that I had to work through two NULL-Pointer access which caused segmentation faults. Which is odd, because I only had those with Address Sanitizer. Anyway, the crashes happened here and here. I could not find any place where vcodec_ctx->codec could have been set, and it was NULL under asan. Could it be that it should be c->codec_id instead?

Anyway, as the sanitizers did not provide any grand revelation, the only thing that is left is to bisect from 7.12.0 to HEAD. This could take a while, so I have to ask for some patience.

ddennedy commented 1 year ago

I do not recommend bisecting. That will be a waste of time. We already have a good idea using good old fashioned mental analysis. The reason the thread counts go high is due to the avformat producer cached private parts not closing. This is due to a garbage collection thing in mlt_cache that defers closing cached items until references (frames here) are released.

Yamakuzure commented 1 year ago

I do not recommend bisecting. That will be a waste of time.

Well, already done. :wink:

72e00e67753a21b6f4d55016eeaa0268157c3631 is the first bad commit
commit 72e00e67753a21b6f4d55016eeaa0268157c3631
Author: Brian Matherly <code@brianmatherly.com>
Date:   Wed Nov 23 22:14:34 2022 -0600

    Add chain normalizer links

    Right now swresample is the only normalizer link.

 src/framework/mlt.vers         |  5 +++++
 src/framework/mlt_chain.c      | 27 +++++++++++++++++++++++++++
 src/framework/mlt_chain.h      |  1 +
 src/mlt++/MltChain.cpp         | 19 ++++++++++++++++---
 src/modules/xml/consumer_xml.c |  2 +-
 src/modules/xml/producer_xml.c |  2 ++
 6 files changed, 52 insertions(+), 4 deletions(-)

Commit 065528c556a98aa9a81abd26f15125da9174ca0d : (rendering at 38%, thread count at 251)

 $ top -b -n 1 | head -n 8
top - 19:00:38 up 1 day, 10:28,  7 users,  load average: 6,32, 5,96, 5,69
Tasks: 595 total,   1 running, 594 sleeping,   0 stopped,   0 zombie
%CPU(s):  5,4 us,  1,9 sy,  7,3 ni, 85,4 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
MiB Spch:  63810,5 total,  24273,5 free,  31209,7 used,   8327,3 buff/cache
MiB Swap:   8192,0 total,   8192,0 free,      0,0 used.  24559,0 avail Spch

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     ZEIT+ BEFEHL
26956 sed       23   3   35,0g   8,1g 483812 S 100,0  12,9  22:15.95 melt-7

Commit 72e00e67753a21b6f4d55016eeaa0268157c3631 (rendering at 22%, thread count at 1,291)

 $ top -b -n 1 | head -n 8
top - 18:49:15 up 1 day, 10:17,  7 users,  load average: 8,03, 6,18, 5,35
Tasks: 599 total,   1 running, 598 sleeping,   0 stopped,   0 zombie
%CPU(s):  7,6 us,  2,3 sy, 14,4 ni, 75,7 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
MiB Spch:  63810,5 total,  10029,2 free,  45426,6 used,   8354,8 buff/cache
MiB Swap:   8192,0 total,   8192,0 free,      0,0 used.  10279,4 avail Spch

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     ZEIT+ BEFEHL
10304 sed       23   3   55,7g  20,9g 484004 S 100,0  33,6  33:10.43 melt-7

I hope this helps.

Edit : To make a test whether this is really true, I built from my fork at https://github.com/Yamakuzure/mlt/commit/46cd3064749ff4f53da99afadcc329ad32845f5d

==> Issue persists

Then I did this:

diff --git a/src/framework/mlt_chain.c b/src/framework/mlt_chain.c
index f8569327..24078b81 100644
--- a/src/framework/mlt_chain.c
+++ b/src/framework/mlt_chain.c
@@ -408,8 +408,12 @@ void mlt_chain_close(mlt_chain self)

 extern void mlt_chain_attach_normalizers(mlt_chain self)
 {
+#if 0 /// just a little experiment...
     if (!self)
         return;
+#else
+    return;
+#endif // 0

     if (mlt_chain_link_count(self)
         && mlt_properties_get_int(MLT_LINK_PROPERTIES(mlt_chain_link(self, 0)), "_loader")) {

And the issue is gone. So whatever these "normalizers" are, they are causing this. (Or something around them.) (Note: I do not use anything listed in chain_normalizers.ini...)

bmatherly commented 1 year ago

I posted a fix for the swresample link that passes my test. Can you test it in your test cases as well? I need to work on a fix for the resample link. It will need to work a little differently.

bmatherly commented 1 year ago

Upon further testing, I hear some audio distortion with audio formats that do not seek well (mp3). I am looking into that.

bmatherly commented 1 year ago

This PR solves the memory/thread leak and the audio distortion for me: https://github.com/mltframework/mlt/pull/903 I will do more testing tomorrow. But it would be great if others could test as well.

Yamakuzure commented 1 year ago

I will add the PR commits to my fork and have a go on my two project files. (The regular and the "render" project.)

Update : Both projects load in 3-5 seconds now. Rendering is fine, no extraordinary thread count or memory consumption any more.

Only difference I spotted to 7.12.0 melt-7 is the long preparation time. The previous melt-7 started almost right away with the rendering, the current one needed 14 and a half minutes (!) before the rendering started ("render" mlt). The rendering itself then took about 23 minutes.

Before the output starts, the log fills up with this:

   1 [producer avformat] audio: total_streams 1 max_stream 0 total_channels 2 max_channels 2
   2 [AVIOContext @ 0x5616f9169600] Statistics: 262144 bytes read, 2 seeks
   3 [h264 @ 0x5616f9457b00] Reinit context to 2560x1440, pix_fmt: yuvj444p
   4 [h264 @ 0x5616f955b940] Reinit context to 2560x1440, pix_fmt: yuvj444p
   5 [h264 @ 0x5616f9670240] Reinit context to 2560x1440, pix_fmt: yuvj444p
   6 [producer avformat] audio: total_streams 1 max_stream 1 total_channels 2 max_channels 2
   7 [AVIOContext @ 0x5616f945d580] Statistics: 2746437 bytes read, 3 seeks
   8 [h264 @ 0x5616faf44580] Reinit context to 2560x1440, pix_fmt: yuvj444p
(...)
6134 [AVIOContext @ 0x561705751600] Statistics: 4257823 bytes read, 0 seeks
6135 [h264 @ 0x56173a470740] Reinit context to 2560x1440, pix_fmt: yuvj444p
6136 [h264 @ 0x561737d6d900] Reinit context to 2560x1440, pix_fmt: yuvj444p
6137 [h264 @ 0x56170633b6c0] Reinit context to 2560x1440, pix_fmt: yuvj444p
6138 [producer avformat] audio: total_streams 1 max_stream 1 total_channels 2 max_channels 2
6139 [AVIOContext @ 0x56173fdcb8c0] Statistics: 3700506 bytes read, 3 seeks
(...)
6193 [AVIOContext @ 0x5616fbcf2640] Statistics: 3654839 bytes read, 0 seeks
6194 [producer avformat] audio: total_streams 1 max_stream 0 total_channels 2 max_channels 2
6195 [AVIOContext @ 0x56170dc91f40] Statistics: 196608 bytes read, 2 seeks
6196 [AVIOContext @ 0x56170e3b5ac0] Statistics: 131072 bytes read, 0 seeks
6197 [producer avformat] audio: total_streams 1 max_stream 0 total_channels 2 max_channels 2
6198 [AVIOContext @ 0x561713d6e900] Statistics: 182390 bytes read, 2 seeks
6199 [h264_nvenc @ 0x7f348c202000] Loaded Nvenc version 11.1
6200 [h264_nvenc @ 0x7f348c202000] Nvenc initialized successfully
6201 [h264_nvenc @ 0x7f348c202000] 1 CUDA capable devices found
6202 [h264_nvenc @ 0x7f348c202000] [ GPU #0 - < Quadro T2000 > has Compute SM 7.5 ]
6203 [h264_nvenc @ 0x7f348c202000] supports NVENC
6204 [h264_nvenc @ 0x7f348c202000] Using global_quality with nvenc is deprecated. Use qp instead.
6205 [AVIOContext @ 0x561710116b00] Statistics: 98304 bytes read, 0 seeks
6206 [h264 @ 0x7f348d39a800] Reinit context to 2560x1440, pix_fmt: yuvj444p
6207 [h264 @ 0x7f348d559c00] Reinit context to 2560x1440, pix_fmt: yuvj444p
6208 [h264 @ 0x7f348e19cdc0] Reinit context to 2560x1440, pix_fmt: yuvj444p
6209 [producer avformat] audio: total_streams 1 max_stream 1 total_channels 2 max_channels 2
6210 [AVIOContext @ 0x7f348d3a03c0] Statistics: 2746437 bytes read, 3 seeks
(...)

This is repeating over and over for over six thousand lines. All my source clips are "2560x1440, pix_fmt: yuvj444p", so I don't know what to "reinit" there. And all but two clips at the beginning have only one filter: "mute", so why the audio stream load?

Anyway, rendering works again, I am very very happy. Thanks you very much!

bmatherly commented 1 year ago

I finished my testing. https://github.com/mltframework/mlt/pull/903 is good to go from my perspective.