jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
MIT License
2.87k stars 326 forks source link

Getting transcripts in other language #3

Closed SundareshPrasanna closed 6 years ago

SundareshPrasanna commented 6 years ago

Hi.. thanks for this package.. i'm observing that while passing a list of video IDs, i'm getting other language transcripts from it. Is there anyway to restrict to English ?

For example: For this video ID : GJLlxj_dtq8

I got the below transcript (sample of it): '嘿,这里是Dave2D 这是微软的Surface Go,当他们发布这款产品的时候我就对它特别感兴趣。 在一段时间的体验后,感觉非常有吸引力的一款设备 我真的认为这是微软这么久以来发布的最好的产品,它的起售价为400美元(约等于2724RMB) 尽管我不认为你应该去买基础配置款,但他们有中等配置 550美元,稍微贵了一些,但是你可以得到两倍的运行内存,两倍的储存空间,而且值得注意的是更快的储存 如果你能付得起的话,那一款配置是值得大多数人购买的 这里这款中等配置的机型 550美元,我 真的喜欢它。好,让我们来看看它的外观。这款设备的制造质量非常好。它是一款SURFACE系列的产品 它有一个合身的镁制外壳,完成度非常高 这个 正面的屏幕的四周有圆角包边,这样确实能够让这款设备握着更加舒适 不像最早的

jdepoix commented 6 years ago

Hi,

this seems like some weird kind of YouTube bug. If you open up the video on YouTube and open the transcript, you'll see that it's in chinese as well, although the subtitles which are printed under the video are in englisch. Since all this package does, is scraping the transcript box from YouTube, there unfortunately is no way to work around this, as long as YouTube displays it in the wrong language 😕

jdepoix commented 5 years ago

@SundareshPrasanna in version 0.1.2 I implemented a feature which lets you retrieve the transcript for specific languages. Using this you might be able to work around your problem!

YouTubeTranscriptApi.get_transcripts('GJLlxj_dtq8', languages=['en',])