Open arturoalcibia opened 2 years ago
Hi @arturoalcibia, sorry for the late reply, somehow I must've missed this issue...
I am not really sure what functionality you are asking for exactly. You are currently able to retrieve transcripts in different languages using
YouTubeTranscriptApi.get_transcript(video_ids, languages=['de', 'en'])
or
YouTubeTranscriptApi.list_transcripts(video_id).find_transcript(['de', 'en']).fetch()
What use case do you have which is not covered by these methods?
Hi @jdepoix,
no worries.
This would give us access to what the user intended the default caption track to be played. Which is usually the language of the video.
As an example, this video contains multiple manually created tracks: https://www.youtube.com/watch?v=UOgvbS4GkF0 But English is the one the user set to default.
You can find which transcript track is set to default by looking at the html returned with the key "defaultCaptionTrackIndex".
In this case, the html has the index 3 as the "defaultCaptionTrackIndex" which corresponds to the english track.
Here's a quick dirty snippet to get the index (Which refers to the english track ).
import requests
from youtube_transcript_api._transcripts import TranscriptListFetcher
videoId = 'UOgvbS4GkF0'
with requests.Session() as http_client:
tListFetcher = TranscriptListFetcher(http_client)
htmlContent = tListFetcher._fetch_video_html(videoId)
captions_json = tListFetcher._extract_captions_json(tListFetcher._fetch_video_html(videoId), videoId)
defaultCaptionIndex = captions_json['audioTracks'][0].get('defaultCaptionTrackIndex', 0)
print(defaultCaptionIndex)
I'd be happy to contribute with a proper M.R. on this.
Hi @arturoalcibia,
okay, that makes sense. In that case the default language would have to be added as a param to the TranscriptList
constructor and the TranscriptList.build
method would have to determine the default language and set it. The language_codes
params on find_manually_created_transcript
, find_generated_transcript
and find_transcript
would have to become optional and if they are not set the default language is used.
Of course any contributions on this are very much welcome! 😊
My only concern is that this would change the default behaviour of this module and could break peoples code if they expect english subtitles (since that's what they've been getting by simply calling get_transcript
). However, using the default language provided by the uploader seems like a more fitting default for this module, so maybe we should accept this breaking change. Any thoughts on this?
Hi @jdepoix,
Sounds good, I agree that the breaking change seems worth it, adding any extra function or argument to return the default language seems overkill and would get confusing. I also think having "english" as a default language feels arbitrary. Returning the default language provided by the user looks cleaner.
Hi @jdepoix,
I think I have a working version with this feature, would it be possible to be added as a contributor to submit a M.R.?
Hi @arturoalcibia, you don't need to be a contributor to submit a PR. You can simply submit a PR from your fork. Read this to find out more!
Hi @arturoalcibia, as this topic just came up in #177, is this something you are still working on? Is there anything I can help you with?
Hi @jdepoix, My bad! I worked on it but forgot to ever submit the PR, if that's okay I will submit it this weekend for review.
@arturoalcibia no worries, I am always appreciative about contributions in any way 😊
Any progress on this? I'm using the cli and it'd be great to have a flag that just returned the default language of the video
@dcsilver I haven't done any active development on this. Apparently, @arturoalcibia has been working on a PR, but hasn't turned it in so far. Any news on this @arturoalcibia?
Any update about defaultAudioLanguage??\
@KhaledLela sorry, I haven't done any development on this and @arturoalcibia has unfortunately never turned in that PR.
Hello! It'd be great to have the default language of a video available as an attribute on the TranscriptList class.
I've been able to get this by accesing the list of subtitles from this url:
Ex:
If more than one subtitle is available, there will be a "default_lang" key on the xml. Which is what the user chose as the language of the video when uploading a file.
I have a M.R. ready but wanted to submit it as an issue in case someone was already working on something similar or had a better approach.