jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
MIT License
2.54k stars 279 forks source link

Transcript "Start" and "Duration" values incorrect #290

Closed theqasim closed 4 weeks ago

theqasim commented 1 month ago

DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.

To Reproduce

Steps to reproduce the behavior: Pull the transcript for this video ID: H_I19q7YKIs (this issues occurs on all the videos I have tested too)

What code / cli command are you executing?

YouTubeTranscriptApi.get_transcript


### Which Python version are you using?
Python 3.12.3

### Which version of youtube-transcript-api are you using?
youtube-transcript-api 1.1.2

# Expected behavior
I expected the "Start" and the "Duration" values to line up correctly, meaning the addition of the start value and the duration will be less than the next transcript value start time, like it is in the example. This isn't the case for all the videos I am testing, which makes creating timestamps for these videos impossible to do based on this data.

# Actual behaviour
When I retrieve the transcript for any YouTube videos, the "start" and "duration" value seem to be incorrect and does not give a proper representation of the timestamps for the video as they overlap.

... error message ...

jdepoix commented 4 weeks ago

Duplicate of #21