madiele / vod2pod-rss

Vod2Pod-RSS converts a YouTube or Twitch channel into a podcast with ease. It creates a podcast RSS that can be listened to directly inside any podcast client. VODs are transcoded to MP3 on the fly and no server storage is needed!
MIT License
202 stars 4 forks source link

Adding additional open source codecs as alternative to MP3 #70

Open johnyb0y opened 1 year ago

johnyb0y commented 1 year ago

Hi, Thanks for your work on this cool project! On Reddit you asked for an issue to be created as a reminder to add additional codecs besides MP3. So this is just a friendly reminder - MP3 is working great.

IMG_1839

A good choice would be the fully open source Opus codec. It’s way more efficient than mp3 and has rather good support across all platforms. It’s also very low impact on resources both in encoding and decoding.

Another good open source codec to consider would be OGG Vorbis.

But as you said on Reddit - the defacto standard in podcasting is still MP3. So I’d add these as optional choices for the tinkerers and keep MP3 as the default setting. Thanks.

madiele commented 1 year ago

@johnyb0y the beta image has now experimental vorbis and opus support if you want to try it (in the docker-compose replace image: madiele/vod2pod-rss with image: madiele/vod2pod-rss:beta, from my tests I can't get seeking to work with them though, from my search opus should allow that, I'm probably missing some ffmpeg setting

to choose a codec set the env var in the compose like so in the enviromets section of vod2pod

- AUDIO_CODEC=OPUS

or

- AUDIO_CODEC=VORBIS

madiele commented 1 year ago

what I found is the browser will fetch the first bytes of the stream, then the final bytes of it before streaming it all with seeking disabled, this behavior makes me think there is something wrong with how the metadata of the stream is set by ffmpeg, I will need to check with an hexeditor if something is wrong...

johnyb0y commented 1 year ago

@madiele Wow, sounds great. I will check it out, but I’m on the road right now so probably need a few days :) Regarding seeking: ffprobe might be handy to figure this out. I will have a look as well.

johnyb0y commented 1 year ago

@madiele I had a quick look regarding opus/vorbis seeking on streams - and it's actually a big rabbit hole. I'm not sure it's a metadata problem. The only quick fix I found which could be promising is the mkclean tool by Matroska for webm containers.

It reorders the elements with the Cues at the front, so your Matroska files are ready to be streamed efficiently over the web.

EDIT: Since you do on-the-fly encoding this might not be helpful at all :/

madiele commented 1 year ago

so yeah, I "tried" to read the RFC regarding the opus encoding format, and yes it seems that by design it's the container format job like ogg, webm, etc.. to give info about how to seek, since I was outputing to the opus format (-f opus) it was never going to work, setting the output to webm (-f webm) improves things, now firefox does not try to get start and end of the stream, but still no seeking....

the real question is if it's possible at all when not having the full input ready to sream on disk, my guess is that by setting a constant bitrate instead of the default variable you could probably edit the webm at the start of the stream to have the right stuff, it's a moon shot though :laughing: but sounds like a cool challenge too

madiele commented 1 year ago

this is what I learned: Basically the webm made by ffmpeg is missing the ebml Cue element that is required to enable seeking, the only way to enable it is to manually add this element back when the stream is starting, this is not impossible but I need to understand how this Cue element needs to be build.

there seems to be a cargo pkg for reading ebml (ebml-iterable) so I will at least try

johnyb0y commented 1 year ago

@madiele interesting! This might also be helpful: https://www.npmjs.com/package/fix-webm-metainfo

johnyb0y commented 1 year ago

@madiele but tbh looking at this npm lib this looks like a lot of work to me :-) Since mp3 is working great there’s no need to put too much of your time in this, just saying.

madiele commented 1 year ago

I made this project also as an excuse to learn about codecs and stuff so no worries, this is fun for me, if I get bored and this is basically impossible I will close the issue as not planned, but for now I'll what I can learn

madiele commented 1 year ago

anyway I did read sucessfully the ebml from the ffmpeg stream!

2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [Ebml(Start)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [EbmlVersion(1)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [EbmlReadVersion(1)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [EbmlMaxIdLength(4)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [EbmlMaxSizeLength(8)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [DocType("webm")]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [DocTypeVersion(2)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [DocTypeReadVersion(2)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [Ebml(End)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [Segment(Start)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [SeekHead(Start)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [Seek(Start)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [SeekID([21, 73, 169, 102])]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [SeekPosition(161)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [Seek(End)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [Seek(Start)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [SeekID([22, 84, 174, 107])]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [SeekPosition(203)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [Seek(End)]
2023-05-27T20:17:51.813Z DEBUG [vod2pod_rss::transcoder] =====> [Seek(Start)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [SeekID([18, 84, 195, 103])]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [SeekPosition(4086)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [Seek(End)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [SeekHead(End)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [Void([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [Info(Start)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [TimestampScale(1000000)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [MuxingApp("Lavf60.3.100")]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [WritingApp("Lavf60.3.100")]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [Info(End)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [Tracks(Start)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [TrackEntry(Start)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [TrackNumber(1)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [TrackUID(11452865887944305837)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [FlagLacing(0)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [Language("eng")]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [CodecID("A_VORBIS")]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [TrackType(2)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [Audio(Start)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [Channels(2)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [SamplingFrequency(48000.0)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [BitDepth(32)]
2023-05-27T20:17:51.814Z DEBUG [vod2pod_rss::transcoder] =====> [Audio(End)]
2023-05-27T20:17:51.919Z DEBUG [vod2pod_rss::transcoder] =====> [Cluster(Start)]
2023-05-27T20:17:51.922Z DEBUG [vod2pod_rss::transcoder] =====> [Timestamp(4)]
2023-05-27T20:17:52.036Z DEBUG [vod2pod_rss::transcoder] =====> [Cluster(Start)]
2023-05-27T20:17:52.036Z DEBUG [vod2pod_rss::transcoder] =====> [Timestamp(1001)]
2023-05-27T20:17:52.170Z DEBUG [vod2pod_rss::transcoder] =====> [Cluster(Start)]
2023-05-27T20:17:52.170Z DEBUG [vod2pod_rss::transcoder] =====> [Timestamp(1982)]
2023-05-27T20:17:52.289Z DEBUG [vod2pod_rss::transcoder] =====> [Cluster(Start)]
2023-05-27T20:17:52.290Z DEBUG [vod2pod_rss::transcoder] =====> [Timestamp(2964)]
2023-05-27T20:17:52.386Z DEBUG [vod2pod_rss::transcoder] =====> [Cluster(Start)]
2023-05-27T20:17:52.386Z DEBUG [vod2pod_rss::transcoder] =====> [Timestamp(3961)]
2023-05-27T20:17:52.489Z DEBUG [vod2pod_rss::transcoder] =====> [Cluster(Start)]
2023-05-27T20:17:52.489Z DEBUG [vod2pod_rss::transcoder] =====> [Timestamp(4960)]
2023-05-27T20:17:52.623Z DEBUG [vod2pod_rss::transcoder] =====> [Cluster(Start)]
2023-05-27T20:17:52.623Z DEBUG [vod2pod_rss::transcoder] =====> [Timestamp(5958)]
2023-05-27T20:17:52.773Z DEBUG [vod2pod_rss::transcoder] =====> [Cluster(Start)]

with this ffmpeg command

ffmpeg -ss 0 -i https://...... -acodec libvorbis -ab 192k -f webm -bufsize 5760 -maxrate 5760k -timeout 300 -hide_banner -loglevel error pipe:stdout

I've noticed that the timestamp values are all aroud 1000 which is promissing

johnyb0y commented 1 year ago

Cool! There also seems to be cluster_time_limit and cluster_size_limit parameters to manually adjust each cluster: https://ffmpeg.org/doxygen/2.8/structMatroskaMuxContext.html

But I haven’t yet fully understood how this works tbh.

johnyb0y commented 1 year ago

Good explanation:

-reserve_index_space E……. Reserve a given amount of space (in bytes) at the beginning of the file for the index (cues). (from 0 to INT_MAX) (default 0)

-cluster_size_limit E……. Store at most the provided amount of bytes in a cluster. (from -1 to INT_MAX) (default -1)

-cluster_time_limit E……. Store at most the provided number of milliseconds in a cluster. (from -1 to I64_MAX) (default -1)

From here: https://gist.github.com/tayvano/6e2d456a9897f55025e25035478a3a50

madiele commented 1 year ago

enough nerding out for today, I've put all the work in progress in the feature/os_codec_support branch, at the moment is mostly stuff to experiment with the rewrite of the webm to see if it works, it's very crude and ugly ATM, but the boilerplate to do a test is mostly done, if we are successful there will be a need for a refactor so that the MP3 encoding is not affected. I already plan to add the rewrite of the mp3 stream in the future (needed for sponsorblock support) so I will need to come up with some kind of re-mux abstraction layer to handle this kind of stuff.

feel free to do a fork and play around with it if you want you can find a guide to setup the project in the CONTRIBUTING.md, I'll probably won't be able to continue experimenting for 1-2 weeks.

This was fun to research I hope I'll get it to a working state :muscle:

edit: found this useful document with the format specification

https://www.matroska.org/technical/diagram.html

a java Cue factory that could be used as a guide to build the cues https://github.com/jcodec/jcodec/blob/master/src/main/java/org/jcodec/containers/mkv/CuesFactory.java

madiele commented 1 year ago

an update: I now added a work in progress remuxer module no added Clues yet but I was able to do a copy and paste remux correctly, unfortunately I discovered that the rust crate for webm encoding has a little quirk of not flushing the buffer until everything is encoded, which is kinda bad for my streaming use case, I will need to fork it and tweak it a little while I wait for an answer of the author

johnyb0y commented 1 year ago

Sounds cool! Another alternative would be to use the official OGG container format to embedd opus. File extension would be .opus

https://en.wikipedia.org/wiki/Ogg

johnyb0y commented 1 year ago

Actually this might also be interesting since you could use the same container for both Vorbis and Opus. .ogg would be the general file extension.

madiele commented 1 year ago

I did a quick test with ogg and I had still problems with seeking, I will try again sometime but for now I'm focusing on making a good remuxer abstraction a I can reuse for the eventual mp3 remuxer and sponsorblock support.

The webm remuxer with my edited library now works well in chrome and VLC, audio goes thorough, so next is to write the Cues and fix the SeekHead (the element thet tells the client where to find the Cues and other stuff) with the correct byte position.

madiele commented 1 year ago

An update, thanks to the proof of concept I know what I need to do to add the muxer but before that I need to work on refactoring the transcoder to allow it to support muxing and more streaming services, at the moment the code has some hacks to make it work for youtube specifically, I want to abstract those away so that I don't make a spaghetti mess when I put the muxer 😆 also it would be cool to have those abstraction so if people want to add support for new muxers or services they can do it easily without needing to worry about understanding the transcoder and other stuff

madiele commented 1 year ago

The mentioned code refactoring is done #82, now adding new media provider should be relatively easy to anyone and any mention of youtube or twitch has been removed from the transcoder, so next time I take this one things should be easier to tackle, when that will happen I don't know, the remuxer is a pretty big challenge (a cool one though).

At the moment my whish list for it are (this is mostly for future me)

BubiBalboa commented 9 months ago

Not sure if I should open a new issue for this since it's so closely related:

Would it be possible to directly download the audio from Youtube without loading the video file and transcoding the audio to mp3?

Youtube is still throttling downloads for me so a 2h 1gb podcast takes hours to download. Directly getting the audio file (m4a always works I think) would help a lot with that. And it would generally be less resource intensive for all parties.

So in effect yt-dlp would run with the option --format m4a

madiele commented 9 months ago

I'm not too keen on it, last time I checked youtube throttles everything going through yt-dlp so I doubt it will help, also I prefer to keep transcoding needed, it allows for more compatibility and less headache for me since I can controll everything, also I fear people would just share the server publically if no transcoding was there and I don't want to risk it. Sorry.

BubiBalboa commented 9 months ago

I mean, it would help a lot since the files are so much smaller. I think Youtube would be happy if we could cut the traffic by 80%.

Compatibility shouldn't realistically be an issue since phones can play just about every codec these days, right? And people who install a Docker container to host their own RSS feeds should hopefully know what their podcast player of choice can play.

I don't really see how someone sharing their server would become your problem? Worst case is they get their Google account rate limited.

It's your decision, of course, but I feel there are more good reasons for downloading the audio files directly than against it. Maybe sleep on it before you say no.

madiele commented 9 months ago

I though I already trancoded the audio version only, sorry, checking again that does not seem the case, so yeah I can see to make that change when I have an hour or two for it, but the transcoding step to mp3 will always stay it makes the feed much more compatible with every client out there and it's very fast anyway, also the url generated by yt-dlp has a limited lifetime, making the episode break after a while if you just return the raw url, and many clients will not refresh old episodes when fetching the rss so I really don't like the idea, I want the generated feed to be as ever green as possible

BubiBalboa commented 9 months ago

Making the download faster and more efficient is my main objective. I don't mind the transcoding from audio to audio. I think this is a good compromise.

madiele commented 9 months ago

@BubiBalboa made the change, should be available on the beta image soon, I'll use it for a while, if it's stable I'll release a new version in the next weeks