discussion! - Githubissues

FredlinT commented 7 years ago

Thanks for your work! I am a post student interested in lipreading!!!! Based on your code ，I have achieved a rather good result as follow:

[Epoch 66] Out of 256 samples: [CER: 0.711 - 0.029] [WER: 0.469 - 0.078] [BLEU: 0.937 - 0.937]. This result is in unseen_speakers. I want to know how many epoch do you set in the unseen_speakers ? the model you release is 368weights.h5. I assume that model is what you have claimed in the results and it is in 368 epoch in overlapped ,is it right?

Thanks a lot !

rizkiarm commented 7 years ago

Hi, that was rather high for unseen speakers. Maybe you need to check for possible mistakes in train val split. The results for both unseen and overlapped speakers split, as well as its epoch, is available in the Results section in README.md.

michiyosony commented 7 years ago

@rizkiarm When you trained weights368.h5, did you leave out speakers 1, 2, 20, and 22 (as the LipNet authors did)?

rizkiarm commented 7 years ago

The weights368.h5 was trained on overlapped speakers split, not unseen speakers split. I can upload the latter if you need it.

michiyosony commented 7 years ago

@rizkiarm I see, thank you. I wasn't paying enough attention to the different types of training. Is "unseen speakers" the case where a particular speaker's videos are either all in the training set or all in the validation set? And "overlapped speakers" where each speaker has some videos in the training set and some videos in the validation set?

My question was motivated by trying to figure out whether the videos in evaluation/samples were in the training set or not. Could you comment on this?

Uploading the model training on unseen speakers would be very interesting; thanks!

rizkiarm commented 7 years ago

@michiyosony yeah that's right.

The data in the evaluation/samples was picked arbitrarily just as examples. It doesn't belong to any validation sets. You may use the real validation sets if you want to test the model.

I'll upload the unseen speakers model later today if possible.

michiyosony commented 7 years ago

@rizkiarm Great, thanks.

With regards to testing using the "real validation sets", how do I know which videos were in your validation set and not in your training set?

I'm attempting to run against the GRID videos in the validation set because I've been getting very odd results when I use the model with my own videos, despite getting very good results when running against the provided videos in evaluation/samples.

For example, here are two videos of a speaker saying "place white in J 3 please". The predictions were generated using the command

python predict.py models/weights368.h5  path/to/video/pwij3p.mpg

pwij3p.mpg.zip Predicted text: "place red i c bin please"
pwij3p.mpg.zip Predicted text: "place green a d six soon"

To verify the mouth was being correctly identified, I tried processing the videos using extract_mouth_batch.py. I verified that the output was good (e.g. mouth_010 and mouth_038 ) and that running predict.py on the frames instead of the videos gave the same (not good) result.

Any thoughts?

michiyosony commented 7 years ago

My problem seems to have been the input video--quite possibly the lighting. Another set of videos is giving rather good results; here's a frame: mouth_025

michiyosony commented 7 years ago

Though I did get a set of fairly accurate text predictions with four well-lit videos, that seems to not have been the key. Subsequent videos in good lighting have not produced consistently good predictions.

rizkiarm commented 7 years ago

Hi @michiyosony, the detail about which data belongs to which sets can be found in the dataset cache (it is a Numpy dump file) if you train the model yourself. As for the listed weight, it was there only for demonstration. I highly recommend you to train the model yourself should you want to ascertain the performance of the model.

You might want to fine-tune the model to the new dataset as to allow the model to adapt to new variations.

Btw, I've uploaded the unseen speaker's weight. Sorry for the delay.

michiyosony commented 7 years ago

Thanks @rizkiarm--I realize your models are just for demonstration. We wanted to try our your model on our own videos because how well LipNet can generalize is a big unknown! It's still unclear whether our (relative) lack of success in using your model on our own videos is a limitation of LipNet to generalize to different speakers, or whether there's something (e.g. lighting, frame rate) in how we're recording/processing the videos that LipNet can't generalize across.

jiagnhaiyang commented 4 years ago

感谢您的工作！我是一名对唇读感兴趣的研究生！！！！根据您的代码，我取得了相当不错的效果，如下所示：

[Epoch 66]在256个样本中：[CER：0.711-0.029] [WER：0.469-0.078] [BLEU：0.937-0.937]。该结果在unseen_speakers中。我想知道您在unseen_speakers中设置了多少个时代？您发布的模型为368weights.h5。我认为该模型就是您在结果中要求的模型，并且在368个时期中处于重叠状态，对吗？

非常感谢！

@FredlinT 您好，我想知道您有没有遇到这样的问题 Traceback (most recent call last): File "D:/lunwen/LipNet-master/training/unseen_speakers/train.py", line 82, in train(run_name, 0, 50, 3, 100, 50, 75, 32, 1) File "D:/lunwen/LipNet-master/training/unseen_speakers/train.py", line 77, in train pickle_safe=False) File "C:\Users\22672\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "C:\Users\22672\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 2192, in fit_generator generator_output = next(output_generator) File "C:\Users\22672\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\data_utils.py", line 793, in get six.reraise(value.class, value, value.traceback) File "C:\Users\22672\AppData\Local\Programs\Python\Python36\lib\site-packages\six.py", line 693, in reraise raise value File "C:\Users\22672\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\data_utils.py", line 658, in _data_generator_task generator_output = next(self._generator) TypeError: 'threadsafe_iter' object is not an iterator

rizkiarm / LipNet

discussion! #7