Closed benoit74 closed 2 months ago
Subtitles are retrieved from
E.g for https://www.ted.com/talks/matt_mills_image_recognition_that_triggers_augmented_reality, the URL the scraper considers is https://www.ted.com/talks/subtitles/id/1515/lang/fr
While the URL used on TED platform is now https://hls.ted.com/project_masters/1140/subtitles/fr/full.vtt?intro_master_id=2346
The import part seems to be the query parameter, if we remove it we get the same timings.
The full set of subtitles seems to be available from https://hls.ted.com/project_masters/1140/metadata.json?intro_master_id=2346 and this link seems to be available in the playerData of the video page.
This is quite a great simplification because:
@benoit74
I think for this particular issue, what changes should be made are:
I can think of this much and I have a probable code ready for the same, would love to get a pr issued on this, Thanks!
@Veeransh14 I don't get at all what you want to do. Your words are very generic and do not help at all to know if you've understood what has to be done.
Please be more specific in what you intend to do or I will probably have to work on this myself, it is an urgent topic to solve asap for us.
I'll take care of this issue myself right now
It looks like reality is way simpler than my complex explanation in previous comment regarding intros (which still have to be handled but seems to concern only a very small portions of videos)
Looking at the code, it seems that we've mostly always applied an offset of 11820 ms to subtitles
@rgaudin do you have any rememberings of this magic value?
Now that you mention it it rings a bell but I'm pretty sure it was there before the refactor.
I am so sorry @benoit74, I could have been more specific, would take care of this henceforth, please do let me know if I can solve any other issues (if possible), meanwhile I would keep going through other issues if I could solve any. Thank you so much !
For some (all?) videos, the subtitles are time shifted by about 4 to 5 secs, while they are properly aligned on TED web platform. It makes them very hard to use (or at least useless when you just need an aid to better understand a language you have difficulties to hear properly).
I've checked two videos, one from Youtube and one from TED CDN and they are both impacted.