matham / ffpyplayer

A cython implementation of an ffmpeg based player.
GNU Lesser General Public License v3.0
134 stars 37 forks source link

MediaWriter pts calculation assumes timebase denominator is 1 #129

Open dcnieho opened 2 years ago

dcnieho commented 2 years ago

Line 481 of /ffpyplayer/writer.pyx is as follows: rounded_pts = <int64_t>floor(pts / av_q2d(s.codec_ctx.time_base) + 0.5)

This works correctly if time_base is (X,1) for any X, but not for timebases with denominator other than 1. Specifically, i want to write a file with fps = 24.9069 Hz. That means in the opts dict for the stream i set frame_rate:(244212,9805), which is a close approximation with a reasonable denominator. If i then feed in exactly frame_number/fps as pts into write_frame things work ok, but as soon as i have some jitter on my frame timestamps, things collapse because the above line may well lead to two consecutive frames with the same rounded_pts, which causes Error writing packet: Invalid argument with h264 into a mp4 container.

More generally, an AVFrame's pts should be in time_base units (ffmpeg code says pts is Presentation timestamp in time_base units (time when frame should be shown to user)), and the above code only does that when timebase has a denominator of 1, but not otherwise. I think the line should be: rounded_pts = <int64_t>floor(pts*s.codec_ctx.time_base.den + 0.5) (which is equivalent to rounded_pts = <int64_t>floor(pts / av_q2d(s.codec_ctx.time_base)*s.codec_ctx.time_base.num + 0.5)), but am not able to test this myself.

Calculation example: I have some frames with the following pts (in seconds): [0., 0.04014954222907545, 0.0802990844581509, 0.10583208445814671, 0.12895308445813133, 0.14478608445817373]. That is VFR. With my timebase of num 9805, den 244212 (i.e. av_q2d returns 0.040149542201038), the above code gives these frames pts:

rounded_pts         without rounding
                0                   0
1.000000000000000   1.000000000698315
2.000000000000000   2.000000001396629
3.000000000000000   2.635947476766234
3.000000000000000   3.211819547342088
4.000000000000000   3.606170245558340

where they should be:

    0
 9805
19610
25845
31492
35358

which is what is yielded by floor(pts*s.codec_ctx.time_base.den + 0.5).