Open joegesualdo opened 8 years ago
I have the same problem, the time is not working:
const vttToJson = require("vtt-to-json")
const vttString = `
00:00:40.790 --> 00:00:40.800 align:start position:0%
soms bij andere ploegen helemaal niets
00:00:40.800 --> 00:00:43.160 align:start position:0%
soms bij andere ploegen helemaal niets
iets<00:00:41.010><c> ik</c><00:00:41.399><c> kan</c><00:00:41.550><c> daar</c><00:00:41.760><c> enorm</c><00:00:42.210><c> van</c><00:00:42.390><c> genieten</c>
`;
vttToJson(vttString)
.then((result) => {
console.log(result.length)
for(const component of result ) {
console.log(component)
}
});
const output =
{ start: 40790,
end: 40800,
part: 'soms bij andere ploegen helemaal niets',
words:
[ { word: 'soms', time: undefined },
{ word: 'bij', time: undefined },
{ word: 'andere', time: undefined },
{ word: 'ploegen', time: undefined },
{ word: 'helemaal', time: undefined },
{ word: 'niets', time: undefined } ] }
{ start: 40790,
end: 40800,
part: '',
words: [ { word: '', time: undefined } ] }
Problem
Doesn't correctly parse
Why
Instead of separating words by the caption time, some youtube vtt subtitles separates by syllables.
Example
Video: https://www.youtube.com/watch?v=xQPBPB8UDyQ