twelvelabs-io / tl-jockey

Jockey is a conversational video agent.
51 stars 13 forks source link

Validate aspect ratios and resolutions of clips before concatenating. #31

Open TravisCouture opened 5 months ago

TravisCouture commented 5 months ago

There are some edge cases were aspect ratios (SAR or PAR and DAR) are incompatible across streams. In such cases, the concatenation of video streams will fail. This needs to be checked for manually and handled if found.

  1. Validate SAR/PAR for all video input streams in combine_clips()
  2. Handle setting the SAR/PAR if required:

    input_streams = []
    # Placeholder -- we can decide based off of resolution of actual clips
    target_resolution = (1280, 720)
    
    for clip in clips:
      video_id = clip['video_id']
      start = clip['start']
      end = clip['end']
      video_filepath = os.path.join(os.environ["HOST_PUBLIC_DIR"], index_id, f"{video_id}_{start}_{end}.mp4")
    
      if not os.path.isfile(video_filepath):
          try:
              download_video(video_id=video_id, index_id=index_id, start=start, end=end)
          except AssertionError as error:
              error_response = {
                  "message": f"There was an error retrieving the video metadata for Video ID: {video_id} in Index ID: {index_id}. "
                             "Double check that the Video ID and Index ID are valid and correct.",
                  "error": str(error)
              }
              return error_response
    
      video_input_stream = ffmpeg.input(filename=video_filepath, loglevel="error").video
      audio_input_stream = ffmpeg.input(filename=video_filepath, loglevel="error").audio
      video_input_stream = (
          video_input_stream
          .filter("scale", target_resolution[0], target_resolution[1])
          .filter("setsar", "1/1")
          .filter("setpts", "PTS-STARTPTS")
      )
      audio_input_stream = audio_input_stream.filter("asetpts", "PTS-STARTPTS")
    
      input_streams.append(video_input_stream)
      input_streams.append(audio_input_stream)
    output_filepath = os.path.join(os.environ["HOST_PUBLIC_DIR"], index_id, output_filename)
    ffmpeg.concat(*input_streams, v=1, a=1).output(output_filepath, vcodec="libx264", acodec="libmp3lame", loglevel="verbose").overwrite_output().run()
DmitriiTsy commented 1 month ago

@TravisCouture hey, do you remember the prompt you were using to identify the problem with the ratios? I'm testing it with a variety of clips and can't find an issue to test my implementation Screenshot 2024-10-03 at 1 01 22 PM