zmwangx / rust-ffmpeg

Safe FFmpeg wrapper.
Do What The F*ck You Want To Public License
1.27k stars 202 forks source link

seek has bugs #203

Open Mon-ius opened 1 month ago

Mon-ius commented 1 month ago

Current seek function always results in fallback to initital frames.

E.g.

        for i in frame_sample {
            let seek_idx = (i as f64 / frame_rate) / time_base;
            ictx.seek((seek_idx * time_base) as i64, ..((seek_idx * time_base) as i64 + 1))?;

            for (stream, packet) in ictx.packets() {
                if stream.index() == idx {
                    decoder.send_packet(&packet)?;
                    let mut decoded_frame = frame::video::Video::empty();
                    while decoder.receive_frame(&mut decoded_frame).is_ok() {
                        let mut frame = frame::video::Video::empty();
                        scaler.run(&decoded_frame, &mut frame)?;
                        frames.push(frame);
                        break;
                    }
                    break;
                }
            }
        }
Mon-ius commented 1 month ago

considering,

pub async fn get_info(&self) -> Result<(), Box<dyn std::error::Error>> {
        let mut container = format::input(&self.file).unwrap();
        let stream = container.streams().best(media::Type::Video).unwrap();
        let mut decoder = ffmpeg_next::codec::context::Context::from_parameters(stream.parameters())?.decoder().video()?;
        let idx = stream.index();
        let num = stream.frames();
        let fps: f64 = stream.avg_frame_rate().into();
        let time_base = stream.time_base();
        let start_time = stream.start_time();

        println!("\t num: {}", num);
        println!("\t fps: {}", fps);
        println!("\t time_base: {}", time_base);
        println!("\t start_time: {}", start_time);

        let target = (num as f64 / 2.0).ceil() as i64;
        let target_sec = target as f64  / fps ;
        let target_timestamp = (target_sec / f64::from(time_base)) as i64 + start_time;

        println!("\t target: {}", target);
        println!("\t target_sec: {}", target_sec);
        println!("\t target_timestamp: {}", target_timestamp);

        container.seek(target_timestamp*15, target_timestamp*15..).unwrap();

        let mut pp = Vec::new();

        let mut i=0;
        let mut found = false;
        for (stream, mut packet) in container.packets() {
            if stream.index() == idx {
                decoder.send_packet(&packet)?;
                let mut decoded = frame::video::Video::empty();
                while decoder.receive_frame(&mut decoded).is_ok() {
                    let pts = decoded.pts().unwrap();
                    pp.push(pts);
                }
            }
        }

        for p in pp{
            println!("{:#?}", p);
            break;
        }
        decoder.send_eof()?;
        Ok(())
    }

why we need to multiple 15 to get the correct seek????

Mon-ius commented 1 month ago

compare to

import av
from PIL import Image

def timestamp_to_frame(timestamp, stream):
    fps = stream.average_rate
    time_base = stream.time_base
    start_time = stream.start_time
    frame = (timestamp - start_time) * float(time_base) * float(fps)
    return frame

frame_skip: int = 10
frames = []

container = av.open("1.mp4")
video_stream = next(s for s in container.streams if s.type == "video")

num_frames = 0
for packet in container.demux(video_stream):
    for frame in packet.decode():
        num_frames += 1

fps = video_stream.average_rate
time_base = video_stream.time_base
start_time = video_stream.start_time

print(num_frames)
print(fps)
print(time_base)
print(start_time)

target_frame = int(num_frames / 2.0)
target_sec = float(target_frame * 1 / fps)
target_timestamp = int(target_sec / time_base) + video_stream.start_time

print(target_frame)
print(target_sec)
print(target_timestamp)

i = 0
x = []

container.seek(target_timestamp, stream=video_stream)
for packet in container.demux(video_stream):
    for frame in packet.decode():
        x.append(frame.pts)

# 900900
# 180180
# print(i)
# img_array = frame.to_ndarray(format="rgb24")
# im = Image.fromarray(img_array)
# im.save("frame-1.png")

print(x[0])
Mon-ius commented 1 month ago

Besides, the video with YUV420P format, and HD (1-1-1) cannot be decode properly. The file is attached.

https://github.com/user-attachments/assets/6795e31d-89b3-419f-a3dd-846b883d573e