Thank you for your great work! My question is that when I tried a new video and wanted to generate the transcription from the movement of the lip, I only got unrelated words such as [bin/green/c/nine/soon...] Are there some tricks for correctly running the model for an arbitrary video? Thanks in advance.
Thank you for your great work! My question is that when I tried a new video and wanted to generate the transcription from the movement of the lip, I only got unrelated words such as [bin/green/c/nine/soon...] Are there some tricks for correctly running the model for an arbitrary video? Thanks in advance.