jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
MIT License
2.87k stars 326 forks source link

Chapter support #182

Closed not-poma closed 1 year ago

not-poma commented 1 year ago

Is your feature request related to a problem? Please describe. I often need to split the transcript by youtube timecodes/chapters. Can such functionality be implemented or it's out of scope for this project?

Describe the solution you'd like Not sure how to better integrate it into the current project structure. For example it could be a separate function that fetches video info object, and then a formatter that accepts subs and chapters info and outputs transcript split by chapters.

Describe alternatives you've considered Currently I download subs and info.json using youtube-dl. It works but this lib seems to be a more lightweight way to do that since I don't need to work with the videos.

Additional context youtube-dl seems to extract it like this

jdepoix commented 1 year ago

Hi @not-poma,

thank you for the suggestion! While I see that potentially being useful, I do not really see a good way to fit this into the current API, without introducing breaking changes. Therefore (as you also noted), that functionality would have to be added as a separate module. As you pointed out this module is quite lightweight and has a pretty concise scope and I feel like adding such a module wouldn't really fit within that scope.

We might need a new project called youtube-video-info-api, which provides access to all the information contained by the JSON from which the transcripts are currently retrieved. However, AFAIK most of that information can be accessed through the official YouTube API, the main reason I created this module a few years ago was that the official YouTube API only allowed me to get manually created transcripts for videos, but not the ASR ones.

I will close this now as I don't see a path forward here within this module but feel free to discuss alternative solutions!

xingfanxia commented 6 months ago

interested in this feature as well