roman380 / gdcl.co.uk-mpeg4

DirectShow MPEG-4 Part 14 (.MP4) Multiplexer and Demultiplexer Filters
http://alax.info/blog/1542
32 stars 10 forks source link

DTStoSample takes several secs for some large files #53

Open rjappleton opened 7 months ago

rjappleton commented 7 months ago

I've encountered a 25fps file approx 30 mins in length that has a CTTS table.

I found that seeking near the start of the file was fine, but the further you seeked into the file the longer it took to complete the seek - e.g. about three-quarters of the way through the file took about 10 seconds on a fairly fast laptop.

I found that all of the time was being consumed inside DTStoSample. As an experiment I removed the call to CTSOffset (on line 387) and the seek was then near instant.

CTSOffset always scans from the start of the CTTS table until it finds the required sample, but it is being called for every sample in DTStoSample's for loop - that seems to be what is taking so many secs to complete.

On the assumtion that CTTS values are probably going to be only 2 or 3 frames duration maximum, I tried only doing the CTSOffset call when the loop gets 'fairly close' to the specified tStart value - I just chose an arbitrary 0.5 sec for 'fairly close'. This seemed to do the trick:

LONGLONG tLimit = m_tAtBase + TrackToReftime(nEntries * nDuration)

if (tStart < tLimit + 5000000)
{
    tLimit += CTSOffset(m_nBaseSample + nEntries);
}

I'm not sure whether some mp4 files CTTS tables might break this, so rather than using an arbitrary 0.5 sec value for 'fairly close', perhaps a better approach would be, when the CTTS is read-in do a quick scan to find the biggest positive value; then use that for the 'fairly close' value?

roman380 commented 7 months ago

Do you think it's about seeking itself, not the problem that you, for example, seek into position that requires to go back to previous keyframe and decode from there?

Because this is basically the most important thing to identify the path for improvement.

  1. If frame accurate seek is slow because of need to go back to IDR, then it's specific to file and potentially you could just seek to that IDR directly (I think demux has this, either had or I added it in past)
  2. There might be a problem in the code itself, in CTTS processing for example, and this can be improved
  3. There might be another problem such as incorrect procressing of the layout and demux simply seeks incorrectly with excessive preroll as the consequence
rjappleton commented 7 months ago

Thanks for the quick reply Roman.

The demux I'm using was derived from an earlier version of the GDCL, so there are some differences, but the DTSToSample and CTSOffset are functionally identical to the current version in gdcl.co.uk-mpeg4

The seeking at the start of ThreadProc is different - yours calls CheckInSegment while in mine there are calls to DTSToSample follwed by SyncFor. It has always worked well for me, but it is this call to DTSToSample that takes ages to return for the large file. Without looking closely at your CheckInSegment function (which my version does not have), I can't tell whether your code would be similarly affected to mine, although a few lines later you do have a call to DTSToSample(tStop) which might be affected.

It's not so much a report about slow seeking, rather a report that for large files that have CTTS table, any call to DTSToSample() may take a long time to return which gets worse the closer the specified DTS is to the end of the file.

In my own version I'm going to scan the CTTS as it is read-in for the largest positive value, and use that in my suggested code in the original post.

roman380 commented 7 months ago

I have long files in one of the projects, multi-hour recordings. I don't remember complains on slow seeking. Also it still might depend on certain internal specifics, how large those boxes are and how efficiently demux goes over them. If you share a sample file, I could take a look and check how seeking looks for me with my build.

rjappleton commented 7 months ago

Aha ... now that's interesting. I have been working with a Debug build, and I have just tried a Release build.

Without my workaround, for this MP4 file, seeking to near the end of the file was taking about 8 seconds (inside DTSToSample) in the Debug build. But the Release build was taking almost 1 second - only slightly longer than seeking close to the start of the file - noticeably laggy but pretty much acceptable.

I guess the release build must be doing some clever optimisation of DTSToSample and/or CTSOffset.

rjappleton commented 7 months ago

I just put my workaround back in place, and in the Release build the slight lag near the end of the file has gone. It is pretty much instant near the start or near the end, so I think I'll keep the workaround. :)