Can lipreading support streaming or online？

Hi, @zyjcsf, it takes around 20 minutes to finish the whole evaluation on the LRS3 test set (0.9 hours) on GPU, so the real time factor (RTF) of our model should not be high. As our lipreading models are trained in a non-streaming scenario, it requires some modifications to the model (train in a streaming fashion) or the evaluation process to achieve prediction in real-time or near real-time. Please check the following steps to use an offline lip-reading model for the purpose of streaming prediction.

Chunk the image sequences. Limit the input to small chunk or segments. The chunk size can vary depending on the model and the latency requirements for your application.
Infer each chunk. Transcribe each chunk sequentially using the offline lipreading model.
Post-processing. Perform any necessary post-processing, such as merging transcriptions and handling overlaps.

mpc001 / Visual_Speech_Recognition_for_Multiple_Languages

Can lipreading support streaming or online？ #9