Open chahatagarwal opened 4 years ago
This model is only trained for GRID dataset. If your video is saying "hello", it won't predict "hello". Instead it will predict some 6 word sentence based on command(4) + color(4) + preposition(4) + letter(25) + digit(10) + adverb(4). Even with unseen model, you can only predict unseen speaker's video that is in the form of command(4) + color(4) + preposition(4) + letter(25) + digit(10) + adverb(4).