nschlia / ffmpegfs

FUSE-based transcoding filesystem with video support from many formats to FLAC, MP4, TS, WebM, OGG, MP3, HLS, and others.
https://nschlia.github.io/ffmpegfs/
GNU General Public License v3.0
198 stars 14 forks source link

is it possible to mount mov/mp4 as a folder with pngs/jpgs? #26

Closed zhuker closed 5 years ago

zhuker commented 5 years ago

what would it take to implement it?

like so: # ls /storage/videos

video1.mp4
video2.mov

# ffmpegfs /storage/videos /mnt/ffmpegfs # find /mnt/ffmpegfs

/mnt/ffmpegfs/video1.mp4/00001.png
/mnt/ffmpegfs/video1.mp4/00002.png
...
/mnt/ffmpegfs/video1.mov/00001.png
/mnt/ffmpegfs/video1.mov/00002.png
nschlia commented 5 years ago

Do you mean one image per frame? That'll be thousands of virtual files. This may be possible although I am not sure if FUSE will support that (there is a size limit to the buffer filled in with the files), what's the idea behind that?

For a 120 second file you could get some 3500 images. I suppose that could be more than FUSE supports.

zhuker commented 5 years ago

we have a video processing pipeline which accepts a bunch of png/jpg files what we do now:

  1. decode input video to pngs (store 100 thousand pngs in a folder) -
  2. process video via our neural nets

we waste multiple GBs of storage (pngs need to stay on storage after processing to be able to get back to them)

what i want to achieve

  1. mount video with ffmpegfs, get a folder with virtual pngs (no storage wasted)
  2. process as usual

our videos are mostly i-frame only (e.g. prores) so it shouldn't be too processor intensive

nschlia commented 5 years ago

Now I understand. There are two problems, but maybe we can find a solution:

First it won't save you any disk space. MP4/MOV must be decoded sequentially, so if you want to access say frame number 1234 it would require to decode frame 1 to 1233 first. Of course it is possible to seek to an arbitrary position, at least approximately, and start decoding from there. That's required anyway to have at least one i-frame to generate a full picture. But I am not sure if I could locate that image number 1234 that way. There would be some inaccuracy. So mostly it would mean that the whole file needs to be decoded and stored, either on disk or in memory, waiting for someone to access frame images.

Second problem is, as we do not have a single file that grows, how can you make sure that your neuronal net does not hiccup when it tries to access pngs that are not yet ready? Currently, if an app tries to access a part of the file that has not been decoded the read blocks until that chunk is ready. This could be done with file opens as well, but that means that the accessing software may have to wait even for minutes to be able to get the image, depending on the size of the video...

A solution for problem 1 could be to disable the cache but that means once an image was read it will be discarded. Accessing it again means decoding it again. That's a processing time hog.

Problem 2 depends on your software. If it tolerates that it would work.

So some questions:

Will your software read the pngs sequentially and only once? If so, it could be done. Images must still be cached but only until read and then could be discarded.

If it tries to read faster than the images can be made available, will it accept read blocks, even if for several seconds? This will happen especially if it skips images, for example, only reads an image every 5 seconds or so.

Possible trouble:

I'll have to try how FUSE handles 1000+ virtual files. Maybe it cannot handle that and then it'll be a no-go :(

Suggestion:

The cache and discard after reading function would require some significant changes, but I could implement the video-to-image thing with normal caching first so you could try out if it works for you at all.

zhuker commented 5 years ago

first of all - thanks for giving it a thought and for your time

First it won't save you any disk space. MP4/MOV must be decoded sequentially, so if you want to access say frame number 1234 it would require to decode frame 1 to 1233 first.

In general case - you are right. But that's not really a problem for us, since we are mostly using I-frame only video formats (e.g. ProRes+mov and JPEG2000+mxf) which are easily seekable.

But I am not sure if I could locate that image number 1234 that way.

again - in general case you are right. but for our combination of codec+container frames are quite easy to locate

Second problem is, as we do not have a single file that grows, how can you make sure that your neuronal net does not hiccup when it tries to access pngs that are not yet ready?

That's fine, it'll block and wait until file is ready

wait even for minutes to be able to get the image, depending on the size of the video...

waiting for minutes isn't great but it's way better than the several hours we wait before we start processing today

A solution for problem 1 could be to disable the cache but that means once an image was read it will be discarded. Accessing it again means decoding it again. That's a processing time hog.

our videos are normally 4K HDR videos (16bit per pixel) so 20MB per frame file size is not rare, i would very much like to keep caching even if it means storing some of the pngs on actual disk (some LRU policy?)

Will your software read the pngs sequentially and only once?

the processing pipeline is sequential, but when it fails we want to analyze why something happened and then out of this 100k images we want to randomly take a look at 100-200 hundred. Some kind of cache is still needed, me thinks.

If it tries to read faster than the images can be made available, will it accept read blocks, even if for several seconds? This will happen especially if it skips images, for example, only reads an image every 5 seconds or so.

we do process every single frame (no skips), but blocking reads are totally cool (even for seconds), we dont need to be realtime or anything like that

I'll have to try how FUSE handles 1000+ virtual files. Maybe it cannot handle that and then it'll be a no-go :(

please let me know.

i have no idea how FUSE works internally but shouldn't it be

  1. multiple readdir-s to list entries
  2. open or stat specific file

let's assume the following:

for readdir you could keep some kind of simple cache

for open you quickly seek (because we only support seekable video) and decode file

i have a problem understanding how stat would work. but for simplicity sake let's say images are going to be .bmp (or any other uncompressed format) not .png so we know ahead of time what the file size would be

can you educate me why given these assumptions you would need to take care of 1000+ fuse virtual files?

The cache and discard after reading function would require some significant changes, but I could implement the video-to-image thing with normal caching first so you could try out if it works for you at all.

that would be AMAZING

nschlia commented 5 years ago

let's assume the following:

  • we only support easily seekable formats. on mount ffmpegfs will check if video is of supported type and refuse to mount otherwise
  • fs will be mounted read-only

for readdir you could keep some kind of simple cache

for open you quickly seek (because we only support seekable video) and decode file

i have a problem understanding how stat would work. but for simplicity sake let's say images are going to be .bmp (or any other uncompressed format) not .png so we know ahead of time what the file size would be

Given all these assumptions it should be feasible. For stat there may be a problem with formats like png/jpg, because I can not provide a predicted size until the image is actually ready. So the size will definitely be wrong until the image is available, unless I use a fixed size format like BMP which will be a major waste of disk space :)

Your software that consumes the files must cope with files that change size after opening. They may be smaller or larger.

This is probably a big issue when you access the files via Samba or NFS. Samba fills files smaller than predicted with zeros to this size, or cuts them to the size (discards the end) if it is larger. NFS also fills smaller files with zeros, larger files will either be cut or correct (this depends, you can try it as many times as you want to, the outcome is not deterministic).

This is why I add 2.5% to the size prediction to raise the possibility of the file ending up a bit smaller to avoid the cut off... So we could set this to 5 or more per cent to avoid lost parts. On the other hand this would only mean that a very small portion at the bottom of the image is missing and probably nothing important.

can you educate me why given these assumptions you would need to take care of 1000+ fuse virtual files?

Once someone does a ls on a virtual directory, FUSE issues a readdir to me and I have to present the whole directory. So for a 10.000 frames video I would have to present 10.000 virtual images. There seem to be ways to do that with FUSE but I need to try out what happens...

PS:

I am currently preparing release 1.7 (and actually 1.8 which only adds Doxygen documentation), therefore I currently have a feature freeze. I cannot add new features at the moment but I would create a push request, implement your features and then merge it back later. The functionality sounds interesting, I would like to have it in FFmpegfs. I also don't know how long it will take to realise it, but probably a week or two.

zhuker commented 5 years ago

Given all these assumptions it should be feasible. For stat there may be a problem with formats like png/jpg, because I can not provide a predicted size until the image is actually ready. So the size will definitely be wrong until the image is available, unless I use a fixed size format like BMP which will be a major waste of disk space :)

format like BMP which will be a major waste of disk space :)

you mean for cache? i am not copying the files anywhere, just reading them from ffmpegfs

Your software that consumes the files must cope with files that change size after opening. They may be smaller or larger.

i vote for BMP or any other uncompressed format (e.g. PGM, PNM, PPM, uncompressed TIF) so we know the size ahead of time. or at least there should be an option (compressed/uncompressed)

we could set this to 5 or more per cent to avoid lost parts

thats cool

very small portion at the bottom of the image is missing and probably nothing important.

neural nets normally give unpredictable results given damaged/incomplete input, so it's important to preserve the entire image

FUSE issues a readdir to me and I have to present the whole directory

i see! there's no incremental readdir in FUSE?

The functionality sounds interesting, I would like to have it in FFmpegfs.

Yeah that would be really cool.

I also don't know how long it will take to realise it, but probably a week or two.

I wish I could help you, but my knowledge of C++ is really bad. I'll be able to fix minor bugs and make some tweaks.

nschlia commented 5 years ago

format like BMP which will be a major waste of disk space :)

you mean for cache? i am not copying the files anywhere, just reading them from ffmpegfs

For cache. FFmpegs needs to store the images somewhere, memory or disk, at least for as long as they have to be available. Redecoding them over every time someone wants to read them is far too time consuming.

i vote for BMP or any other uncompressed format (e.g. PGM, PNM, PPM, uncompressed TIF) so we know the size ahead of time. or at least there should be an option (compressed/uncompressed)

I'll create PGM, PNM, BMP, or whatever you prefer, this is a simple codec setting, ffmpeg API does that for me. Just choose your favourite weapon :)

very small portion at the bottom of the image is missing and probably nothing important.

neural nets normally give unpredictable results given damaged/incomplete input, so it's important to preserve the entire image

OK, the we need to go for a predictable format. We can then try PNG later or whatever and see what happens.

there's no incremental readdir in FUSE?

No, that's the design of the file systems that access the FUSE drive. You are required to provide the complete directory. Even if you do "ls notEXISTING" the command still needs to examine all files.

zhuker commented 5 years ago

sounds good

nschlia commented 5 years ago

Created a branch to handle this.

zhuker commented 5 years ago

just found out -compression_level 0 -f image2 '%06d.png' gives predictable size pngs but of course uncompressed

nschlia commented 5 years ago

Work in progress...

nschlia commented 5 years ago

The desired functionality has been implemented now, at least partly. There are a few things missing but the current state should be sufficient as a proof of concept.

Caution: The code is by no means yet production grade, it needs extensive overhaul yet, but it works.

The functionality that is missing or incomplete:

  1. It is currently not possible to access a file directly without doing a "ls" on the parent directory first.
  2. The cache format is a ludicrous waste of disk space and needs to be improved.
  3. The cache format change should be automatically applied, for now it is required to delete the cache directory manually before using this version.
  4. The source must be all p-frame, incomplete images will be stored they way they are.

For 1.: A simple

ls /mnt/ffmpegfs/video1.mp4/00001.png

will fail. You need to list the parent directory first like:

ffmpegfs /storage/videos /mnt/ffmpegfs
find /mnt/ffmpegfs

/mnt/ffmpegfs/video1.mp4/00001.png
/mnt/ffmpegfs/video1.mp4/00002.png
...
/mnt/ffmpegfs/video1.mov/00001.png
/mnt/ffmpegfs/video1.mov/00002.png

This builds the frame image virtual directories. Once this was done, the "ls" command will succeed. I guess that won't be a problem as the software that consumes the frame images needs to list the directories in the first place anyway to find out what to scan...

For 2.: For each image a 2 MB segment is reserved in cache. That makes it easy to access e.g. image number 1234 - it is simply at 1234 * 2 MB. But that's a big waste of disk space and limits image size to 2 MB. For 3.: Do a "rm -Rf /path/to/cache" before running ffmpegfs. For 4.: i-frames can be completed with information of previous frames. There are filters to do that, they just need to be activated inside the code.

If the functionality proves useful, these ristrictions will be removed later.

Special considerations:

Creating a virtual set of images won't save you any disk space. The frame images have to be stored somewhere. So you need to set the cache path to a drive with enough space (--cachepath=DIR). Also set a cache limit that is sufficent or remove it completely (--max_cache_size=0). You may set cache expiry time to 1 day (--expiry_time=1d) or whatever appropriate for you so old files get cleaned up once no longer required.

The biggest advantage of using a virtual file set is the ease of use - no need to convert the video to frame images first. Just access them when required and they can be used as soon as they are ready.

To get the right code, change the branch to FBissue#26 and download the ZIP or if you "git clone" it don't forget to do "git checkout FBissue#26" like I did once... :)

zhuker commented 5 years ago

Thanks a lot! I will give a try now. Quick question: why 2MB limit? our frames are normally 1920x1080 which is already over 2MB uncompressed

zhuker commented 5 years ago

Doesn't mount my folder with videos see my shell history attached

Screen Shot 2019-03-20 at 2 52 34 PM
zhuker commented 5 years ago

I am on FB_issue_#26 branch

zhukov@kote:~/ffmpegfs$ git branch
* FB_issue_#26
master
nschlia commented 5 years ago

Thanks a lot! I will give a try now. Quick question: why 2MB limit? our frames are normally 1920x1080 which is already over 2MB uncompressed If you use JPG or PNG it should work. If this is not enough you can change fileio.h line 60: #define IMAGE_MAX_SIZE (2*1024*1024) You could set 5 or more MB. But I guess for PNG/JPG that should be sufficient.

zhuker commented 5 years ago

Thanks a lot! I will give a try now. Quick question: why 2MB limit? our frames are normally 1920x1080 which is already over 2MB uncompressed If you use JPG or PNG it should work. If this is not enough you can change fileio.h line 60: #define IMAGE_MAX_SIZE (2*1024*1024) You could set 5 or more MB. But I guess for PNG/JPG that should be sufficient.

Our frames converted to PNG are normally 10-20MB each

nschlia commented 5 years ago

Doesn't mount my folder with videos see my shell history attached

Did you set --desttype=png or --desttype=jpg? If you set the target to anything else than png/jpg/bmp, you will get audio and video files.

zhuker commented 5 years ago

oh my god!!! it works!!!

zhuker commented 5 years ago

Too early to celebrate :( it does list frames correctly but here is what i get if i want to ffmpeg -i on a PNG

Screen Shot 2019-03-20 at 3 08 19 PM Screen Shot 2019-03-20 at 3 08 27 PM
nschlia commented 5 years ago

Thanks a lot! I will give a try now. Quick question: why 2MB limit? our frames are normally 1920x1080 which is already over 2MB uncompressed If you use JPG or PNG it should work. If this is not enough you can change fileio.h line 60: #define IMAGE_MAX_SIZE (2*1024*1024) You could set 5 or more MB. But I guess for PNG/JPG that should be sufficient.

Our frames converted to PNG are normally 10-20MB each

Whoopsy! Well in that case you cold set #define IMAGE_MAX_SIZE (25*1024*1024) but that would eat up a lot of disk space. I guess with the compression rate I've set 2 MB should be enough. I'll rework the cache format when everything else works.

zhuker commented 5 years ago

#define IMAGE_MAX_SIZE (25*1024*1024)

just set to 20MB

nschlia commented 5 years ago

Too early to celebrate :( it does list frames correctly but here is what i get if i want to ffmpeg -i on a PNG

That's what I was afraid of... The pixel format of your source video is not supported by PNG. Maybe JPG works... If not one of the next things I wanted to do is convert the pixel format if required. This is already done for video to video conversion.

Maybe you can use JPG or you can find a video that has a different pixel format.

If not I am sorry you'll have to be patient and wait a few days until I have completed the pixel format conversion.

zhuker commented 5 years ago

well png is always rgb24 or rgb48 and most videos are yuv420 or yuv422p10 pixel format conversion should be there

nschlia commented 5 years ago

OK. Sorry. Then please be patient I'll add the format conversion.

zhuker commented 5 years ago

i am not complaining it's already a miracle :)

zhuker commented 5 years ago

desttype=jpg works!

zhuker commented 5 years ago

but even for i-frame only videos it takes FOREVER to open the last frame

nschlia commented 5 years ago

desttype=jpg works!

Great. But the pixel format conversion is still required. I also have many videos that won't work. As a nice little extra I can also apply deinterlacing and complete p-frames. So this is a must have.

but even for i-frame only videos it takes FOREVER to open the last frame

The code is far from being optimal, especially the caching code is crap. I know. I wanted to prove if this is feasible in the first place, seems it is, so it's worth digging deeper now. There is much room for optimisations.

Maybe you can check if your video processing software likes the results, especially these locked files until available and the size "morphing". I can speed up processing but not that the files will be there in an instant. Also I will never be able provide the exact file size from the start.

If you video software basically copes with that it's worth walk the extra mile from here.

nschlia commented 5 years ago

i am not complaining it's already a miracle :)

I know :)

zhuker commented 5 years ago

here is a sample 10bit yuv422p10 prores https://kote.videogorillas.com/vmir/vdms/orig.mov when i mount it with desttype=jpg it produces this: 00000000001

i assume its because of lack of yuv422p10 to yuvj420 conversion

zhuker commented 5 years ago

when i mount yuv420p h264 https://kote.videogorillas.com/vmir/vdms/orig.mp4 it produces correct jpg 00000000001

zhuker commented 5 years ago

but all that aside - ITS AMAZING!!! 🥇 👍 🏆

nschlia commented 5 years ago

Is that a scene from "Animal House"?

Well, as I said, I'll implement the pixel conversion. This is going to work with all formats, with a small degradation in quality, but I guess JPG or PNG compression will do more harm.

More to come... :)

zhuker commented 5 years ago

its a scene from "Dawson Creek" old time tv show

zhuker commented 5 years ago

what are you using (editor/debugger/ide) to write code?

nschlia commented 5 years ago

I use QtCreator, with a few tricks I can write code and debug, of course without Qt. I even use QtCreator to write (non-Qt) code for ARM using a cross compiler.

zhuker commented 5 years ago

make_file(buf, filler, VIRTUALTYPE_FRAME, origpath, filename, 40 * 1024, virtualfile->m_st.st_ctime); this is where 40KB size comes from! why not 42KB then :)

nschlia commented 5 years ago

this is where 40KB size comes from! why not 42KB then :)

Well, 42 is the answer to all questions.

You may give it another try. I have implemented pixel format conversion, so your ProRes videos should do as source. Actually you have the full program: pixel format conversion, deinterlacing and rescaling.

Currently jpg works only, png and bmp (RGB pixel formats) yield strange results. It seems I have a lack of understanding how RGB pixel formats work. I try to figure out how.

I also changed the max. image size to 20 MB.

There's still a lot to do, but maybe now you can evaluate the result so far.

nschlia commented 5 years ago

Found this, that's RGB to YUV though... https://stackoverflow.com/questions/21938674/ffmpeg-rgb-to-yuv-conversion-loses-color-and-scale

But the images he gets look the same as my results.

nschlia commented 5 years ago

More:

https://stackoverflow.com/questions/21938674/ffmpeg-rgb-to-yuv-conversion-loses-color-and-scale?answertab=votes#tab-top

nschlia commented 5 years ago

It seems that the deinterlace filter does not work. Simply do not use it, and it will work. I'll try to find out what the problem is. Seems that I am using it wrong. It should work on RGB frames.

zhuker commented 5 years ago

Thanks. I’ll give it a try. Not sure I’ll be able to test before Monday. But if I do have a chance, I’ll report back. Are you using swscale to do pixel format conversions? Deinterlace is not needed in our processing , we do our own. https://github.com/A-Bush/Deep-Video-Deinterlacing/blob/master/README.md

Sent from my iPhone

On Mar 23, 2019, at 12:30 AM, Norbert Schlia notifications@github.com wrote:

It seems that the deinterlace filter does not work. Simply do not use it, and it will work. I'll try to find out what the problem is. Seems that I am using it wrong. I should work on RGB frames.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

nschlia commented 5 years ago

Found the problem. Deinterlacing now also works! You can check if it works for you, next I'll go into optimisations:

nschlia commented 5 years ago

Thanks. I’ll give it a try. Not sure I’ll be able to test before Monday. But if I do have a chance, I’ll report back. Are you using swscale to do pixel format conversions?

Pixel format conversion is done with sws_scale. If the image size is not limited by command line the function merely does the pix_fmt conversion.

If you don't need deinterlace, simply do not use the --deinterlace parameter and the filter will not be called at all.

So I'll wait for the results now. If I find some time maybe I'll get into the list of optimisations.

nschlia commented 5 years ago

Deinterlace is not needed in our processing , we do our own. https://github.com/A-Bush/Deep-Video-Deinterlacing/blob/master/README.md

I see. For simplicity I currently only use yadif, but FFmpeg API supports many algorithms. nnedi for example also uses a neural network. Or there's one that can use CUDA for speed up.

Maybe one of those could save you one processing step. If you could use one just open an issue for it.

Deinterlace the input video ("bwdif" stands for "Bob Weaver Deinterlacing Filter"). https://ffmpeg.org/ffmpeg-filters.html#toc-bwdif Deinterlace input video by applying Donald Graft’s adaptive kernel deinterling. Work on interlaced parts of a video to produce progressive frames. https://ffmpeg.org/ffmpeg-filters.html#toc-kerndeint Apply motion-compensation deinterlacing. https://ffmpeg.org/ffmpeg-filters.html#toc-mcdeint Deinterlace video using neural network edge directed interpolation. https://ffmpeg.org/ffmpeg-filters.html#toc-nnedi Deinterlace the input video ("w3fdif" stands for "Weston 3 Field Deinterlacing Filter"). https://ffmpeg.org/ffmpeg-filters.html#toc-w3fdif Deinterlace the input video ("yadif" means "yet another deinterlacing filter"). https://ffmpeg.org/ffmpeg-filters.html#toc-yadif-1 Deinterlace the input video using the yadif algorithm, but implemented in CUDA so that it can work as part of a GPU accelerated pipeline with nvdec and/or nvenc. https://ffmpeg.org/ffmpeg-filters.html#toc-yadif_005fcuda

zhuker commented 5 years ago

ok finally got to test it, sorry it took me so long it works! 👍

prores -> png ok but should be rgb48 (not rgb24) because prores has 10bit color prores -> jpg ok but weird result when viewed on mac (see below) h264 -> png ok h264 -> jpg same weird result on mac (see below)

prores->png 00000000001

prores->jpg 00000000001

when viewed on Mac

Screen Shot 2019-03-26 at 4 48 55 PM

h264(yuv420p)->png 00000000002 h264(yuv420p)->jpg 00000000002

when viewed on Mac

Screen Shot 2019-03-26 at 4 50 37 PM
nschlia commented 5 years ago

Thanks for testing the new version!

ok finally got to test it, sorry it took me so long it works! +1

Hooray :)

prores -> png ok but should be rgb48 (not rgb24) because prores has 10bit color FFmpeg only seems to have rgb48be (rgb48 big endian format). I could give it a try. I'll implement a selection of the best match format (using av_find_best_pix_fmt_of_2 or so). But that's not on top of the list, there are a few other things more important to be completed first.

prores -> jpg ok but weird result when viewed on mac (see below) h264 -> png ok h264 -> jpg same weird result on mac (see below)

It should not make a difference which source file format you use (precisely which codec, H264, ProRes or whatever) as the pixel format is converted to what PNG, JPG or BMP requires. So it's no wonder the weird results come up with every source.

Have you opened the images directly from the ffmpegfs virtual directory? What happens if you local copy the images on your mac and then open that copy? What happens if you open the virtual image a second time? Still distorted? If so the mac viewer does not like the images I create...

What about BMP?

On Ubuntu the stock image viewer (Gnome Desktop) showed only a small stripe of the image top until I added the functionality that updates the file size once it is known. You can see that when you open an image in KDE Dolphin or Windows Explorer, the file size gets refreshed (from 40 KB to 6 MB or so). In Dolphin you may have to refresh (F5), Explorer does that automatically.

Anyway, the only tweak compared to a real image in a real directory is the file size: ffmpegfs pretends all images to be 40 KB until it has been opened (e.g. with an image viewer) and decoded. Then the file size magically changes to the real size.

Anyway, there is still a lot to do. I am glad that ffmpegfs basically works for you. The list so far:

I will look into that this weekend.

zhuker commented 5 years ago

when viewing on mac i copied from the virtual directory first by just using cp 000000001.png ~

zhuker commented 5 years ago

one more thing i am missing is ability to read random frames without waiting for decode of previous ones. this would be very handy for debugging. also find /mnt/ffmpegfs is not feasible on our 100TB video storage would be nice to just be able to: cp /mnt/ffmpegfs/long/path/to/video/here/00000001.png ~