Open abskjha opened 7 years ago
Hi @slashstar,
I've just got my hand on that dataset recently, but don't really have much time to try it. If you've managed to train the model using that dataset, it would be great if you could share its results :)
Hi, @rizkiarm Thanks for your work!
I am curious about how much time does training take ? And also do you have pre-trained models to play with. I am really interested in its performance.
Hey @rizkiarm - nice work!
Seconding what @jiangsutx said - pre-trained models and some more info about training on your own dataset would be really nice and interesting.
Hi @jiangsutx @robbiebarrat,
It took me 5-7 days to train the model on a shared machine using 2 GTX Titan X.
The pre-trained models are already available in evaluation/models
(trained on overlapped speakers split).
I've trained the variant of this model on my own "wild" dataset similar to Chung et al in another language with quite reasonable results. Haven't tried it on other public datasets other than GRID.
I'm attempting to train on my own dataset- but so far the predictions aren't very good. It's predicting a string of all spaces (each of the arrays it returns have maximum values at index 27...)
Is this normal? I've only been training it overnight on one titan... will the results get better with more training? My loss is hanging around ~120 right now. Did you ever see these bad predictions when training on GRID?
Well, that's weird. Overnight you would get around 50 epoch with loss approximately around 10 or less. Bad prediction is reasonable if your loss is still more than 30.
Thanks for the response!
I believe i might be feeding the y_data into the network wrong... is it true that 'the_labels' should be a array containing indexes of alphabet letters (e.g. if you were trying to write 'abc def', you'd turn that into an array like [0,1,2, 26, 3, 4, 5], and pass it like that?)
That should be correct as far as I remember. You might consider using the Align
class to load and process the alignment, as well as the generator so that you don't have to worry about feeding the data wrongly.
Alright - thank you. I'll do that with the alignments. One last thing though- could you please provide some insight into the label_length and input_length values though? it's my understanding that they're just numpy arrays containing the length of the input (number of frames) or the labels (number of letters in the sentence) in integer form - and when you put these in a batch they stack horizontally, so if you had a group of 45 frames and one of 60, input_length would be [45, 60], right?
@slashstar @rizkiarm How were you able to get the "Lipreading in the wild" dataset? My impression from this was that it wasn't available.
@michiyosony For Lipreading in Wild (LRW) dataset: Go to their project webpage, the instructions are given over there. Basically you need to send an email to Rob Cooper (BBC), and request him for the same.
Ah, thank you. I was confusing "Lipreading in the Wild" with "Lipreading sentences in the Wild".
@robbiebarrat input_length
: number of frames, label_length
: number of characters. Keras doesn't support variable-length input and output. If you want to have a different length of frames/labels on one batch, you need to add padding to it.
@slashstar yeah, it took quite a long time. I've got access after three months of waiting (from January).
Hey guyz, I'm new to machine learning/deep learning and tensorflow and started learning from scratch i have Lips Reading as my Final year Project so guyz can u help me out to understand and run this project on windows and how to train data etc and i have tensorflow-cpu installed.
hello,who have the wild dataset,i don't know the email of Rob Cooper (BBC),your help will be useful for my reseach,thank you !
@crazygirl1992 , rob.cooper (at) bbc.co.uk
Hi , can someone help me with the code for Lip reading in the wild ?? . I have the dataset from Rob Copper
@jigyasubagai sorry,now i can't get the code of wild, now am waitting the password of database
@crazygirl Thanks for replying, if you are able to locate the implementable code of this do let me know. Meanwhile for your Dataset you can contact Rob . He is there on email given above
hello,anyone try to train the model with another language? if it can be used in another language will be wonderful!
hello,do you try to train the model with another language? if it can be used in another language will be wonderful! @robbiebarrat
@crazygirl1992 I didn't try another language, but I tried another dataset, unsuccessfully :(
Sorry, but I doubt I'll be of much help.
the predicting result is bad or the training is unsuccessful ? now i am intending to train this with bbc datasets,do you try that,and how about the result? @robbiebarrat
@crazygirl1992 I think that I formatted the data wrong; it was giving the same prediction for every single array i fed it.
hello,do you train wild datasets with this method, and how about the result?i am intending to do this @slashstar
@crazygirl1992, I have not used LipNet to train LRW dataset. I used WAS architecture (Chung et al, CVPR 2017), it is a character level lipreader. The results were not comparable to the original paper, as they pretrain WAS on a larger corpus before finetuning it to LRW dataset.
hello,when i train,there are 5000 epochs,but when i run to the 200 epochs,the process is stop,i have 2 gpu,i don't know why? @robbiebarrat @rizkiarm
@crazygirl1992 I don't know why it stopped--you'd probably need to include logs for anyone to be able to help you. However, whenever my training crashed, I would restart it to continue training from that point (as opposed to starting over). In train.py
, change the line train(run_name, 0, 5000, 3, 100, 50, 75, 32, 50)
so the value of run_name
is the name of the training session you want to continue and change start_epoch
parameter to be the epoch you want to pick up from (instead of 0).
ok,thank you , i will try @michiyosony
hello,now can you try this code in other datasets, are you success?now i want train it with another language,can you give me some tips@robbiebarrat
hello,now can you try this code in other datasets, are you success?now i want train it with another language,can you give me some tips@robbiebarrat
@crazygirl1992 i haven't worked on this project since last november - sorry, but I don't know.
Hi all - Would anyone be interested in working on this with me? I have the Lip Reading Sentences dataset (5000+ hours), 7 Titan X and a few others on this. Ideally someone who can be in London. Pls pm me. Thanks.
@smorrel1 Hi, could you tell me how did you get the Lip Reading Sentences dataset?
@rizkiarm I am investigating lipnet on a dataset which I prepared of my own. how can I train it for variable size input, currently I have padded/repeated the last mouth frame to make all samples of equal length. also I have access to lip reading sentences in the wild dataset, but the format of data set is completely different from grid corpus. Do you have any script to to change its format?. Also I dont have any GPUs to train, how much time on an average will it take to train on 32GB, I-7 processor system to train on 400 videos of 115 mouth frames.
Hi guys, I am new to this field and wanted to work on this dataset. Can you tell me how to align each frame to a label?
Hi, I like to train the LipNet on a word-based dataset, but I have faced some issue. First I like to know what shoud I pad my real labels with? is it -1, for example it should be ['h', 'e', '-1' ,'-1']. Beside, I cant understand curriculam learning, does any body knows a good resource? Thanks
Hi @rizkiarm
Not sure if this is a right place to ask his query. Have you tried this model on 'Lipreading in Wild' dataset, Joon and Zisserman, ACCV'16?
Thanks.