Closed JuanFMontesinos closed 4 years ago
Hi @JuanFMontesinos , thanks for raising this!
Are you using the video_reader
backend or are you using pyav
backend for this. If it's not that much of a bother, could you try using the former?
From what I see, it seems like libav's probing is for one reason or another missing the timestamp of these frames - it sees 3537 frames so it returns only that. Both pyav
and CV2
use libav's probing for optimization purposes so that might be the root of the issue, and I believe we have some fixes in place with the video_reader
backend.
Hi, BTW I liked you paper Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization.
Soo bit hands on, according to this post of fmassa https://github.com/pytorch/vision/issues/2216 video_reader backend requires compiling from source Here there is more info about video_reader https://github.com/pytorch/vision/releases/tag/v0.4.2
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DWITH_CUDA -I/home/jfm/vision/torchvision/csrc -I/home/jfm/.local/lib/python3.6/site-packages/torch/include -I/home/jfm/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/jfm/.local/lib/python3.6/site-packages/torch/include/TH -I/home/jfm/.local/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c /home/jfm/vision/torchvision/csrc/vision.cpp -o build/temp.linux-x86_64-3.6/home/jfm/vision/torchvision/csrc/vision.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
In file included from /home/jfm/vision/torchvision/csrc/vision.cpp:14:0:
/home/jfm/vision/torchvision/csrc/ROIAlign.h: In function ‘at::Tensor roi_align(const at::Tensor&, const at::Tensor&, double, int64_t, int64_t, int64_t, bool)’:
/home/jfm/vision/torchvision/csrc/ROIAlign.h:28:25: error: ‘class c10::Dispatcher’ has no member named ‘findSchemaOrThrow’; did you mean ‘findSchema’?
.findSchemaOrThrow("torchvision::roi_align", "")
^~~~~~~~~~~~~~~~~
findSchema
/home/jfm/vision/torchvision/csrc/ROIAlign.h:29:31: error: expected primary-expression before ‘decltype’
.typed<decltype(roi_align)>();
^~~~~~~~
/home/jfm/vision/torchvision/csrc/ROIAlign.h: In function ‘at::Tensor ROIAlign_autocast(const at::Tensor&, const at::Tensor&, double, int64_t, int64_t, int64_t, bool)’:
/home/jfm/vision/torchvision/csrc/ROIAlign.h:49:14: error: ‘ExcludeDispatchKeyGuard’ is not a member of ‘c10::impl’
c10::impl::ExcludeDispatchKeyGuard no_autocast(c10::DispatchKey::Autocast);
^~~~~~~~~~~~~~~~~~~~~~~
/home/jfm/vision/torchvision/csrc/ROIAlign.h:49:14: note: suggested alternative: ‘ExcludeTensorTypeIdGuard’
c10::impl::ExcludeDispatchKeyGuard no_autocast(c10::DispatchKey::Autocast);
^~~~~~~~~~~~~~~~~~~~~~~
ExcludeTensorTypeIdGuard
/home/jfm/vision/torchvision/csrc/ROIAlign.h: In function ‘at::Tensor _roi_align_backward(const at::Tensor&, const at::Tensor&, double, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, bool)’:
/home/jfm/vision/torchvision/csrc/ROIAlign.h:76:12: error: ‘class c10::Dispatcher’ has no member named ‘findSchemaOrThrow’; did you mean ‘findSchema’?
.findSchemaOrThrow("torchvision::_roi_align_backward", "")
^~~~~~~~~~~~~~~~~
findSchema
/home/jfm/vision/torchvision/csrc/ROIAlign.h:77:18: error: expected primary-expression before ‘decltype’
.typed<decltype(_roi_align_backward)>();
^~~~~~~~
In file included from /home/jfm/vision/torchvision/csrc/vision.cpp:17:0:
/home/jfm/vision/torchvision/csrc/nms.h: In function ‘at::Tensor nms(const at::Tensor&, const at::Tensor&, double)’:
/home/jfm/vision/torchvision/csrc/nms.h:18:25: error: ‘class c10::Dispatcher’ has no member named ‘findSchemaOrThrow’; did you mean ‘findSchema’?
.findSchemaOrThrow("torchvision::nms", "")
^~~~~~~~~~~~~~~~~
findSchema
/home/jfm/vision/torchvision/csrc/nms.h:19:31: error: expected primary-expression before ‘decltype’
.typed<decltype(nms)>();
^~~~~~~~
/home/jfm/vision/torchvision/csrc/nms.h: In function ‘at::Tensor nms_autocast(const at::Tensor&, const at::Tensor&, double)’:
/home/jfm/vision/torchvision/csrc/nms.h:28:14: error: ‘ExcludeDispatchKeyGuard’ is not a member of ‘c10::impl’
c10::impl::ExcludeDispatchKeyGuard no_autocast(c10::DispatchKey::Autocast);
^~~~~~~~~~~~~~~~~~~~~~~
/home/jfm/vision/torchvision/csrc/nms.h:28:14: note: suggested alternative: ‘ExcludeTensorTypeIdGuard’
c10::impl::ExcludeDispatchKeyGuard no_autocast(c10::DispatchKey::Autocast);
^~~~~~~~~~~~~~~~~~~~~~~
ExcludeTensorTypeIdGuard
/home/jfm/vision/torchvision/csrc/vision.cpp: At global scope:
/home/jfm/vision/torchvision/csrc/vision.cpp:45:14: error: expected constructor, destructor, or type conversion before ‘(’ token TORCH_LIBRARY(torchvision, m) {
^
/home/jfm/vision/torchvision/csrc/vision.cpp:59:19: error: expected constructor, destructor, or type conversion before ‘(’ token TORCH_LIBRARY_IMPL(torchvision, CPU, m) {
^
/home/jfm/vision/torchvision/csrc/vision.cpp:67:19: error: expected constructor, destructor, or type conversion before ‘(’ token TORCH_LIBRARY_IMPL(torchvision, CUDA, m) {
^
/home/jfm/vision/torchvision/csrc/vision.cpp:76:19: error: expected constructor, destructor, or type conversion before ‘(’ token TORCH_LIBRARY_IMPL(torchvision, Autocast, m) {
^
/home/jfm/vision/torchvision/csrc/vision.cpp:82:19: error: expected constructor, destructor, or type conversion before ‘(’ token TORCH_LIBRARY_IMPL(torchvision, Autograd, m) {
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
I've got that wonderful error. Soo I don't feel qualified to debug that 😞 Let me know if you discover anything.
Gotcha, yeah I agree the compilation is annoying to get right, especially outside of the clean env. I really hope the fix for that comes soon :)
The error itself is not necessarily helpful, it's more of a pointer that something has gone wrong. If you have conda installed, can you build this from scratch on the clean env? Just a simple build - the following worked for me:
conda create --name repro python=3.7
conda activate repro
# install prereqs from forge
conda install -y av -c conda-forge
conda install -y pytorch torchvision -c pytorch
# TODO: install torchvision from source to support video reader
### first remove the one installed by conda (DUMB and hacky way, but conda installs all the binary which is convenient)
pip uninstall torchvision
### Then install it from scratch
mkdir -p ~/bin; cd bin
git clone git@github.com:pytorch/vision.git
cd vision
python setup.py install
In the meantime, I'll take a look into your video as well to see what i get
Ok, so I've passed on this and it seems like it's a problem with your video, not a reader problem - in any way I check, there are only 3537 frames registered; specifically:
FFPROBE:
(tv08) bjuncek@qgpu:~/work/issue_repro$ ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of default=nokey=1:noprint_wrappers=1 data/1u3yHICR_BU.mkv
3537
FFMPEG (see last line)
(tv08) bjuncek@qgpu:~/work/issue_repro$ ffmpeg -i data/1u3yHICR_BU.mkv -map 0:v:0 -c copy -f null -
Input #0, matroska,webm, from 'data/1u3yHICR_BU.mkv':
Metadata:
MINOR_VERSION : 0
COMPATIBLE_BRANDS: iso6avc1mp41
MAJOR_BRAND : dash
ENCODER : Lavf57.83.100
Duration: 00:02:27.54, start: 0.000000, bitrate: 2607 kb/s
Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 1k tbn, 50 tbc (default)
Metadata:
HANDLER_NAME : VideoHandler
ENCODER : Lavc57.107.100 libx264
DURATION : 00:02:27.523000000
Stream #0:1(eng): Audio: vorbis, 48000 Hz, stereo, fltp (default)
Metadata:
ENCODER : Lavc57.107.100 libvorbis
DURATION : 00:02:27.538000000
Output #0, null, to 'pipe:':
Metadata:
MINOR_VERSION : 0
COMPATIBLE_BRANDS: iso6avc1mp41
MAJOR_BRAND : dash
encoder : Lavf58.29.100
Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 25 fps, 25 tbr, 1k tbn, 1k tbc (default)
Metadata:
HANDLER_NAME : VideoHandler
ENCODER : Lavc57.107.100 libx264
DURATION : 00:02:27.523000000
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
frame= 3537 fps=0.0 q=-1.0 Lsize=N/A time=00:02:27.40 bitrate=N/A speed=5.13e+03x
PYAV and CV2
import av
images_av = []
container = av.open(PATH)
# container.streams.video[0].thread_type = # force single thread
for frame in container.decode(video=0):
images_av.append(frame.to_rgb().to_ndarray())
len(images_av)
# 3537
cap = cv2.VideoCapture(PATH)
images_cv2 = []
while(cap.isOpened()):
ret, frame = cap.read()
if ret is True:
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
images_cv2.append(frame)
else:
break
cap.release()
len(images_cv2)
# 3537
Finally TV with video reader backend:
import torchvision
from torchvision.io import read_video
PATH = 'data/1u3yHICR_BU.mkv'
torchvision.set_video_backend("video_reader")
torchvision_video, torchvision_audio, info = read_video(PATH)
print("TV version", torchvision.__version__)
print("video fps by TV", info['video_fps'])
print('Frames obtained by torchvision: %d '%torchvision_video.shape[0])
ImageIO:
Fails on reading this video with your code with
OSError: [Errno 12] Cannot allocate memory
I'm going to close this as it seems a video specific thing and FFMPEG and FFPROBE seem to show the same number of frames as returned by video reader.
Hi, I think it doesn't probe my point. This is a webm video stream (so bad quality) downloaded from youtube (that's why ended up as mkv) and resampled by ffmpeg. This could be a typical DL pipeline. Ofc I imagine there is some "issue" with the video from either the downloader, the container or youtube itself. However I understand that the original aim of torchvision's reader was providing a robust reader. Therefore it should be able to deal with shitty videos (variable framerate, weird framerates like 18.53) and I assume that's why the sourcecode checks timestamps rather than doing Time*FPS in both the audio and video stream. If the idea is forcing the user to disscard a sample, the user will simply look for a workaround (in my case using Nvidia DALI or imageio). That's why I took my time to report this issue and to provide an example, cos it's tricky and requires more expertise than what I can provide.
I already mentioned that readers are counting 3557 frames and that it was consistent with ffmpeg, ffprobe, opencv. scikit-video and torchvision. Even imageio "detects"/counts that amount. You would need between 16 and 32 Gb of RAM to run the full code (which loads the video twice).
What I wanted to highligh again is that 2 min 27.54 sec = 2x60x25+27.54x25 = 3688 frames (there are 3557)
I propose this workaround which calls raw ffmpeg in a subprocess in case you cannot have acess to enough RAM. Feel free to run it if you find time.
Or you can directly run unix ffmpeg by
ffmpeg -i /media/jfm/Slave/SkDataset/videos/cello/1u3yHICR_BU.mkv %05d.bmp
ffmpeg code from https://stackoverflow.com/questions/10957412/fastest-way-to-extract-frames-using-ffmpeg
from imageio import get_reader
from torchvision.io import read_video
import torchvision
import subprocess
import os
import shutil
torchvision.set_video_backend('video_reader')
PATH = '/media/jfm/Slave/SkDataset/videos/cello/1u3yHICR_BU.mkv'
torchvision_video, torchvision_audio, info = read_video(PATH, pts_unit='sec')
# Expected duration
dur = torchvision_audio.shape[1] / info['audio_fps']
min = dur // 60
sec = dur % 60
print('Backend: %s' % torchvision.get_video_backend())
print('Expected duration: %d min, %d sec' % (min, sec))
print('Expected amount of frames %d' % int(dur * 25))
reader = get_reader(PATH)
print('Expected frames by different readers %d' % reader.count_frames())
print('Frames obtained by torchvision: %d ' % torchvision_video.shape[0])
os.mkdir('./bmp_files')
dst = os.path.abspath('./bmp_files')
dst = os.path.join(dst, '%05d.bmp')
print('Writing frames at %s' % dst)
print('Executing Popen: %s' % "ffmpeg -i " + PATH + " " + dst)
result = subprocess.Popen(["ffmpeg", "-i", PATH, dst],
stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
output = [str(x) for x in result.stdout.readlines()]
for line in output:
print(line)
print('Frames obtained by ffmpeg: %d' % len(os.listdir('./bmp_files')))
print('')
Regards
I understand that the original aim of torchvision's reader was providing a robust reader.
The idea of torchvision video reader is to be robust and flexible - it doesn't make assumptions and it reads what can be read from the format if it's supported by ffmpeg (since it uses it in the underlying implementation). In the case of the video which is in one way or another corrupted (like the one you have here), it won't break or ask you to re-encode video in a particular way - it will read whatever is salvageable from the video and not fail.
What I wanted to highligh again is that 2 min 27.54 sec = 2x60x25+27.54x25 = 3688 frames (there are 3557)
I understand that - that's why I'm saying that it's likely the issue of re-encoding and packaging the video rather than decoding itself. If ffmpeg itself cannot see more frames, that means an issue stemmed from there - whether they are missing headers the packets are corrupted. All the implementations you have mentioned (ffmpeg, ffprobe, opencv. scikit-video and torchvision) call c-implemetation of ffmpeg in their underlying implementation.
the user will simply look for a workaround (in my case using Nvidia DALI or imageio)
(I would also add decord as well). These are all viable alternatives which have their strengths and weaknesses, but are ultimately just as amazing as ours is. Note that DALI and decord use almost exactly the same ffmpeg calls as torchvision/cv2/pyav (but make some approximations as a trade-off for speed so they repeat or skip some frames and ignore additional streams), so I'm not sure how much different results you can expect, but they are well worth looking into.
Also please note that this issue is not written off, but will be revisited once we better understand what broke during the re-encoding of the video
I see thanks for the clarification. I just was worried about why plain ffmpeg frame extraction can see the 3688 frames but the backend ffmpeg used by these many libraries is reading less. But I imagine, as you said it's a problem of header and metadata. BTW I didn't know decord. That's for pointing it out.
Thank you very much for your time. Juan
@bjuncek I'm using the commands you shared in https://github.com/pytorch/vision/issues/2490#issuecomment-664674758 to build, but I'm getting the following error. Do you know what could be wrong?
gcc -pthread -B /home/fernando/.conda/envs/repro/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/fernando/bin/vision/torchvision/csrc -I/home/fernando/.conda/envs/repro/lib/python3.7/site-packages/torch/include -I/home/fernando/.conda/envs/repro/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/fernando/.conda/envs/repro/lib/python3.7/site-packages/torch/include/TH -I/home/fernando/.conda/envs/repro/lib/python3.7/site-packages/torch/include/THC -I/home/fernando/.conda/envs/repro/include/python3.7m -c /home/fernando/bin/vision/torchvision/csrc/vision.cpp -o build/temp.linux-x86_64-3.7/home/fernando/bin/vision/torchvision/csrc/vision.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/fernando/bin/vision/torchvision/csrc/vision.cpp:14:0:
/home/fernando/bin/vision/torchvision/csrc/ROIAlign.h: In function ‘at::Tensor roi_align(const at::Tensor&, const at::Tensor&, double, int64_t, int64_t, int64_t, bool)’:
/home/fernando/bin/vision/torchvision/csrc/ROIAlign.h:28:25: error: ‘class c10::Dispatcher’ has no member named ‘findSchemaOrThrow’; did you mean ‘findSchema’?
.findSchemaOrThrow("torchvision::roi_align", "")
^~~~~~~~~~~~~~~~~
findSchema
/home/fernando/bin/vision/torchvision/csrc/ROIAlign.h:29:31: error: expected primary-expression before ‘decltype’
.typed<decltype(roi_align)>();
^~~~~~~~
/home/fernando/bin/vision/torchvision/csrc/ROIAlign.h: In function ‘at::Tensor _roi_align_backward(const at::Tensor&, const at::Tensor&, double, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, bool)’:
/home/fernando/bin/vision/torchvision/csrc/ROIAlign.h:76:12: error: ‘class c10::Dispatcher’ has no member named ‘findSchemaOrThrow’; did you mean ‘findSchema’?
.findSchemaOrThrow("torchvision::_roi_align_backward", "")
^~~~~~~~~~~~~~~~~
findSchema
/home/fernando/bin/vision/torchvision/csrc/ROIAlign.h:77:18: error: expected primary-expression before ‘decltype’
.typed<decltype(_roi_align_backward)>();
^~~~~~~~
In file included from /home/fernando/bin/vision/torchvision/csrc/vision.cpp:17:0:
/home/fernando/bin/vision/torchvision/csrc/nms.h: In function ‘at::Tensor nms(const at::Tensor&, const at::Tensor&, double)’:
/home/fernando/bin/vision/torchvision/csrc/nms.h:18:25: error: ‘class c10::Dispatcher’ has no member named ‘findSchemaOrThrow’; did you mean ‘findSchema’?
.findSchemaOrThrow("torchvision::nms", "")
^~~~~~~~~~~~~~~~~
findSchema
/home/fernando/bin/vision/torchvision/csrc/nms.h:19:31: error: expected primary-expression before ‘decltype’
.typed<decltype(nms)>();
^~~~~~~~
/home/fernando/bin/vision/torchvision/csrc/vision.cpp: At global scope:
/home/fernando/bin/vision/torchvision/csrc/vision.cpp:45:14: error: expected constructor, destructor, or type conversion before ‘(’ token
TORCH_LIBRARY(torchvision, m) {
^
/home/fernando/bin/vision/torchvision/csrc/vision.cpp:59:19: error: expected constructor, destructor, or type conversion before ‘(’ token
TORCH_LIBRARY_IMPL(torchvision, CPU, m) {
^
/home/fernando/bin/vision/torchvision/csrc/vision.cpp:82:19: error: expected constructor, destructor, or type conversion before ‘(’ token
TORCH_LIBRARY_IMPL(torchvision, Autograd, m) {
^
error: command 'gcc' failed with exit status 1
@fepegar I believe you need to update your PyTorch version and recompile torchvision again.
@fepegar I believe you need to update your PyTorch version and recompile torchvision again.
Thanks. For some reason, conda
was installing 1.4
so I had to explicitly ask for pytorch=1.6
.
🐛 Bug
Hi, I’ve realized that torchvision as well as other libraries suck as skvideo and opencv retrieve less amount of frames than ffmpeg. I found this happens only for some videos.
Context: I ve a rencoded dataset of videos which are 25.0 FPS. Rencoding has been done via ffmpeg.
Recording (.mkv) contains audio stream and video stream. Both streams are same duration (according to metadata info from ffprobe) Audio stream’s duration match the ones stated by metadata
Extracting frames via unix command line with ffmpeg provides a proper amount of frames (3688 in case of the given video example)
ffmpeg -i /media/jfm/Slave/SkDataset/videos/cello/1u3yHICR_BU.mkv %05d.bmp
Extracting frames with other libraries such us skvideo or opencv only obtains 3537 frames. My knowledge about the intrisecs of these libraries is limited. I verified that torchvision reader is not discarding frames with negative stamps (seems not to be the case).
I found a library which captures the proper amount of videos: imageio. However it's reader only counts 3537 frames (but reads 3688)
To Reproduce
Video example to reproduce the issue. Video example: https://drive.google.com/file/d/1DIRsDf1SrLOTGbVejoL-PEIlxDPP0LMC/view?usp=sharing
Environment
Torchvision version:
0.5.0
Imageio version :2.5.0