thomas-xin / Encodec-Stream

A lightweight wrapper around https://github.com/facebookresearch/encodec that enables dynamic streamed reading, seeking, metadata and GPU support.
MIT License
11 stars 2 forks source link

Reverse streams #1

Open WindowsNT opened 3 months ago

WindowsNT commented 3 months ago

Thanks for this. When encoding, can I also stream the ecdc to stdout instead of a target ecdc file? When decoding, can I also stream from stdin instead of a source ecdc?

I'm working on creating a Windows Media Foundation Transform for my Video Sequencer.

WindowsNT commented 3 months ago

I just saw that the code gets the entire input before passing it to the Encodec, which means that there's no meaning to what I said because the whole stream must be first input before sending to the codec.

Now, is there a way to actually work in streams ? That is, pass part of data to the codec then loop. Otherwise ecdc can't be used in realtime.

thomas-xin commented 3 months ago

I just saw that the code gets the entire input before passing it to the Encodec, which means that there's no meaning to what I said because the whole stream must be first input before sending to the codec.

Now, is there a way to actually work in streams ? That is, pass part of data to the codec then loop. Otherwise ecdc can't be used in realtime.

Ah, I probably should've documented this better; streaming is currently implemented for decoding only. I planned to implement streamed encoding support but I've been sidetracked and occupied by a lot of other things, and never quite got around to making a serious attempt. I'll get back to this in maybe a few days

WindowsNT commented 3 months ago

Thanks, the question is, can the encodec encode and decode stuff in chunks? Or it needs the entire data?

thomas-xin commented 3 months ago

Thanks, the question is, can the encodec encode and decode stuff in chunks? Or it needs the entire data?

Decoding is already implemented in chunks, accurate to the specs/description of this repo, where it uses gradually increases a window size to avoid recomputation. Although it does require a minimum of 1 second of audio buffered before outputting due to how the format encodes packets, which is still significantly longer than every other format. At some point perhaps a new feature could be deliberately encoding extra beginning packets that consist of smaller chunks beginning and ending with silence, iteratively increasing in size? Could be a solution if lower latency is needed (such as streaming live audio)

WindowsNT commented 3 months ago

Basically what I aim to do is to encode/decode audio and video in it in mp4. Currently, my video app creates ecdc files as audio but im trying to see if this can be used as a generic codec for audio within video as well. This means that the encoder may take a number of frames initally as a latency all right, but then it must consistenly return encoded bytes each time it is fed new data and vice versa for the decoder too.

That's why I am wondering whether it can be done, because I 've seen encodec take whole files as input.

thomas-xin commented 3 months ago

Basically what I aim to do is to encode/decode audio and video in it in mp4. Currently, my video app creates ecdc files as audio but im trying to see if this can be used as a generic codec for audio within video as well. This means that the encoder may take a number of frames initally as a latency all right, but then it must consistenly return encoded bytes each time it is fed new data and vice versa for the decoder too.

That's why I am wondering whether it can be done, because I 've seen encodec take whole files as input.

Hmm, well it definitely can be done, but not in the state this repo is right now, as there's no encoding support, and decoding still has the 1 second buffer.

What are you creating this video file from? A live recording/web stream, or existing file? And how are you storing the resulting data inside the video? I don't think the mp4 container natively allows such a thing, unless it fits in metadata or something?

WindowsNT commented 3 months ago

Windows has a library called media foundation that creates video files based on audio and video streams. These streams contain media types which are descriptions of the compression used in the streams, say, mp3. Windows allows custom media types so I can create a media type handler that will register the ecdc media type globally in the system as an encoder or decoder and then all windows apps will be able to play or write that format.

I also can keep the codec for private use for my app to create mp4 files that can only be played by my app as well.

thomas-xin commented 3 months ago

Windows has a library called media foundation that creates video files based on audio and video streams. These streams contain media types which are descriptions of the compression used in the streams, say, mp3. Windows allows custom media types so I can create a media type handler that will register the ecdc media type globally in the system as an encoder or decoder and then all windows apps will be able to play or write that format.

I also can keep the codec for private use for my app to create mp4 files that can only be played by my app as well.

Hmm, well it would still have python as a dependency; it may be packaged along in an exe but the data files would take a lot of space, plus the encodec models themselves. If it's okay with you, it'll probably work out.

If you want the main reason I stopped working on this repo, it's that I kind of wish they continued developing this model, perhaps trained it at higher bandwidths to allow it to replace existing formats, since right now ecdc 24k is about the same relative quality as opus 48k, and although this repo allows you to go higher the quality gains drop off quickly. There's also the prospect of using this technology for image and video files, which would be great for how much space they typically still take up.

But yeah, if there's demand for this repo as it is today, I guess I'll try get around to finishing the implementation when I can.

WindowsNT commented 3 months ago

Yes, redistribution is no problem. The only requirement is to be able to work in chunks.

Keep up the good work. Here is my own work.

thomas-xin commented 3 months ago

Nice! That project looks really impressive, and it's always nice to see someone putting together cutting edge algorithms to make them accessible in a user-friendly app.

Regarding chunks, I'll definitely check first of all whether it's possible to decode partial files either by modifying the library or by padding them, if not I'll probably need to make yet another new version of the format that directly supports it. Will let you know how it goes.

thomas-xin commented 2 months ago

Just wanted to give an update on this: ffmpeg appears to have changed/removed some features which has slowed down testing significantly; arguments such as -ac 2 are no longer supported by the latest version of ffplay, among other differences. I'll still work on this properly eventually, but I can't give an estimate on the timeframe because these breaking changes have affected some of my other projects too.

WindowsNT commented 2 months ago

Actually I would try not to use ffmpeg. Users that want the Encodec embedded would probably have their own mechanisms to supply raw audio to your library or take raw audio from it. Just focus on the ability to stream :) Keep up the good work.