Closed gmh8000 closed 3 years ago
Also, I notice that reference 1 doesn't open source the visual front end it uses; This visual front end may be from another article, the title is Combining Residual Networks with LSTMs for lipreading. Did you directly use the model in this article as the visual front end?
Yes for both the questions in first comment.
Ref 1 has provided the links to its models (lip model and language model) in a shell script. I have taken the weights from those links.
I noticed that the open source model in Ref 1 is based on the TensorFlow framework, have you converted this open source model to the PyTorch framework? In addition, the model in Ref 1 was trained with the visual front end and the back end together, did you cut the back end before and only retain the weight and structure of the visual front end?