talmolab / dreem

DREEM Relates Every Entities' Motion (DREEM). Global Tracking Transformers for biological multi-object tracking.
https://dreem.sleap.ai
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

Change video backend from `imageio` `PIL.Image` and `skimage.io` to `sleap_io.Video`? #56

Open aaprasad opened 5 months ago

aaprasad commented 5 months ago

Currently we use imageio and skimage.io for video reading. However, sleap-io now supports video reading. We should look into swapping out these backends to rely only on sleap-io where possible. Definitely, we can do this with the SleapDataset but may need to check if sleap-io supports both multi-image and single-image .tif files.

aaprasad commented 5 months ago

alternatively we should stick to imageio only and remove the usage of PIL.Image, skimage.io because imageio seems to support tif files. We should also not open files at every __getitem__ call. Just store readers at __init__ and if necessary, add a __del__ function to handle file closing

aaprasad commented 5 months ago

@talmo thoughts?

talmo commented 5 months ago

Multiple images are now supported as of https://github.com/talmolab/sleap-io/pull/88, so you can do:

import sleap_io as sio

video = sio.load_video(["img1.jpg", "img2.jpg", "img3.jpg"])

The extensions supported are manually whitelisted by backend type (and used to reroute high level API to the appropriate backends):

MediaVideo: ("mp4", "avi", "mov", "mj2", "mkv")

HDF5Video: ("h5", "hdf5", "slp")

ImageVideo: ("png", "jpg", "jpeg", "tif", "tiff", "bmp")

ImageVideo currently does not support TIFF stacks and it's a bit tricky since you can have single image TIFFs as well. I think the MediaVideo reader maybe makes more sense for TIFF stacks when using imageio as the backend, but might get hairy when using the other ones (it falls back to cv2 or pyav if available).

If you'd like to implement a TIFF stack video backend (or add it to one of the others), this would be a good addition to sleap-io.

Everything else should be supported though.

talmo commented 3 months ago

Also relevant: https://github.com/talmolab/dreem/pull/64#issuecomment-2244185625

If you want to consider delegating video I/O to sleap-io, here's some hints:

import sleap_io as sio

video = sio.load_video("video.mp4")  # opens video handle and keeps it open
video = sio.load_video("video.mp4", keep_open=False)  # will close the reader after each call to read a frame
video = sio.load_video("video.mp4", backend="opencv")  # see note below on backends

# Explicitly control reader state:
video.close()  # closes backend but remembers state of the settings (hopefully)
video.open()  # opens backend reader again, remembering state
video.is_open  # checks the state if you want to have custom logic (calling open on already open videos is a noop anyway though)

On backend options:

  • "opencv": Fastest video reader, but only supports a limited number of codecs and may not be able to read some videos. It requires opencv-python to be installed. It is the fastest because it uses the OpenCV C++ library to read videos, but is limited by the version of FFMPEG that was linked into it at build time as well as the OpenCV version used.
  • "FFMPEG": Slowest, but most reliable. This is the default backend. It requires imageio-ffmpeg and a ffmpeg executable on the system path (which can be installed via conda). The imageio plugin for FFMPEG reads frames into raw bytes which are communicated to Python through STDOUT on a subprocess pipe, which can be slow. However, it is the most reliable and feature-complete. If you install the conda-forge version of ffmpeg, it will be compiled with support for many codecs, including GPU-accelerated codecs like NVDEC for H264 and others.
  • "pyav": Supports most codecs that FFMPEG does, but not as complete or reliable of an implementation in imageio as FFMPEG for some video types. It is faster than FFMPEG because it uses the av package to read frames directly into numpy arrays in memory without the need for a subprocess pipe. These are Python bindings for the C library libav, which is the same library that FFMPEG uses under the hood.
talmo commented 3 months ago

One more factor: there's WIP over in sleap-nn on a performant video reader for inference, which we'll upstream to sleap-io at some point to be reused here.

aaprasad commented 3 months ago

another thought is to switch over to decord + kornia since they read directly into tensors: https://github.com/talmolab/dreem/issues/72