rizkiarm / LipNet

Keras implementation of 'LipNet: End-to-End Sentence-level Lipreading'
MIT License
635 stars 226 forks source link

Lipreading in Wild dataset #4

Open abskjha opened 7 years ago

abskjha commented 7 years ago

Hi @rizkiarm

Not sure if this is a right place to ask his query. Have you tried this model on 'Lipreading in Wild' dataset, Joon and Zisserman, ACCV'16?

Thanks.

rizkiarm commented 7 years ago

Hi @slashstar,

I've just got my hand on that dataset recently, but don't really have much time to try it. If you've managed to train the model using that dataset, it would be great if you could share its results :)

jiangsutx commented 7 years ago

Hi, @rizkiarm Thanks for your work!

I am curious about how much time does training take ? And also do you have pre-trained models to play with. I am really interested in its performance.

robbiebarrat commented 7 years ago

Hey @rizkiarm - nice work!

Seconding what @jiangsutx said - pre-trained models and some more info about training on your own dataset would be really nice and interesting.

rizkiarm commented 7 years ago

Hi @jiangsutx @robbiebarrat,

It took me 5-7 days to train the model on a shared machine using 2 GTX Titan X. The pre-trained models are already available in evaluation/models (trained on overlapped speakers split).

I've trained the variant of this model on my own "wild" dataset similar to Chung et al in another language with quite reasonable results. Haven't tried it on other public datasets other than GRID.

robbiebarrat commented 7 years ago

I'm attempting to train on my own dataset- but so far the predictions aren't very good. It's predicting a string of all spaces (each of the arrays it returns have maximum values at index 27...)

Is this normal? I've only been training it overnight on one titan... will the results get better with more training? My loss is hanging around ~120 right now. Did you ever see these bad predictions when training on GRID?

rizkiarm commented 7 years ago

Well, that's weird. Overnight you would get around 50 epoch with loss approximately around 10 or less. Bad prediction is reasonable if your loss is still more than 30.

robbiebarrat commented 7 years ago

Thanks for the response!

I believe i might be feeding the y_data into the network wrong... is it true that 'the_labels' should be a array containing indexes of alphabet letters (e.g. if you were trying to write 'abc def', you'd turn that into an array like [0,1,2, 26, 3, 4, 5], and pass it like that?)

rizkiarm commented 7 years ago

That should be correct as far as I remember. You might consider using the Align class to load and process the alignment, as well as the generator so that you don't have to worry about feeding the data wrongly.

robbiebarrat commented 7 years ago

Alright - thank you. I'll do that with the alignments. One last thing though- could you please provide some insight into the label_length and input_length values though? it's my understanding that they're just numpy arrays containing the length of the input (number of frames) or the labels (number of letters in the sentence) in integer form - and when you put these in a batch they stack horizontally, so if you had a group of 45 frames and one of 60, input_length would be [45, 60], right?

michiyosony commented 7 years ago

@slashstar @rizkiarm How were you able to get the "Lipreading in the wild" dataset? My impression from this was that it wasn't available.

abskjha commented 7 years ago

@michiyosony For Lipreading in Wild (LRW) dataset: Go to their project webpage, the instructions are given over there. Basically you need to send an email to Rob Cooper (BBC), and request him for the same.

michiyosony commented 7 years ago

Ah, thank you. I was confusing "Lipreading in the Wild" with "Lipreading sentences in the Wild".

rizkiarm commented 7 years ago

@robbiebarrat input_length: number of frames, label_length: number of characters. Keras doesn't support variable-length input and output. If you want to have a different length of frames/labels on one batch, you need to add padding to it.

@slashstar yeah, it took quite a long time. I've got access after three months of waiting (from January).

Anandra-Singh commented 6 years ago

Hey guyz, I'm new to machine learning/deep learning and tensorflow and started learning from scratch i have Lips Reading as my Final year Project so guyz can u help me out to understand and run this project on windows and how to train data etc and i have tensorflow-cpu installed.

crazygirl1992 commented 6 years ago

hello,who have the wild dataset,i don't know the email of Rob Cooper (BBC),your help will be useful for my reseach,thank you !

abskjha commented 6 years ago

@crazygirl1992 , rob.cooper (at) bbc.co.uk

jigyasubagai commented 6 years ago

Hi , can someone help me with the code for Lip reading in the wild ?? . I have the dataset from Rob Copper

crazygirl1992 commented 6 years ago

@jigyasubagai sorry,now i can't get the code of wild, now am waitting the password of database

jigyasubagai commented 6 years ago

@crazygirl Thanks for replying, if you are able to locate the implementable code of this do let me know. Meanwhile for your Dataset you can contact Rob . He is there on email given above

crazygirl1992 commented 6 years ago

hello,anyone try to train the model with another language? if it can be used in another language will be wonderful!

crazygirl1992 commented 6 years ago

hello,do you try to train the model with another language? if it can be used in another language will be wonderful! @robbiebarrat

robbiebarrat commented 6 years ago

@crazygirl1992 I didn't try another language, but I tried another dataset, unsuccessfully :(

Sorry, but I doubt I'll be of much help.

crazygirl1992 commented 6 years ago

the predicting result is bad or the training is unsuccessful ? now i am intending to train this with bbc datasets,do you try that,and how about the result? @robbiebarrat

robbiebarrat commented 6 years ago

@crazygirl1992 I think that I formatted the data wrong; it was giving the same prediction for every single array i fed it.

crazygirl1992 commented 6 years ago

hello,do you train wild datasets with this method, and how about the result?i am intending to do this @slashstar

abskjha commented 6 years ago

@crazygirl1992, I have not used LipNet to train LRW dataset. I used WAS architecture (Chung et al, CVPR 2017), it is a character level lipreader. The results were not comparable to the original paper, as they pretrain WAS on a larger corpus before finetuning it to LRW dataset.

crazygirl1992 commented 6 years ago

hello,when i train,there are 5000 epochs,but when i run to the 200 epochs,the process is stop,i have 2 gpu,i don't know why? @robbiebarrat @rizkiarm

michiyosony commented 6 years ago

@crazygirl1992 I don't know why it stopped--you'd probably need to include logs for anyone to be able to help you. However, whenever my training crashed, I would restart it to continue training from that point (as opposed to starting over). In train.py, change the line train(run_name, 0, 5000, 3, 100, 50, 75, 32, 50) so the value of run_name is the name of the training session you want to continue and change start_epoch parameter to be the epoch you want to pick up from (instead of 0).

crazygirl1992 commented 6 years ago

ok,thank you , i will try @michiyosony

crazygirl1992 commented 6 years ago

hello,now can you try this code in other datasets, are you success?now i want train it with another language,can you give me some tips@robbiebarrat

crazygirl1992 commented 6 years ago

hello,now can you try this code in other datasets, are you success?now i want train it with another language,can you give me some tips@robbiebarrat

robbiebarrat commented 6 years ago

@crazygirl1992 i haven't worked on this project since last november - sorry, but I don't know.

smorrel1 commented 6 years ago

Hi all - Would anyone be interested in working on this with me? I have the Lip Reading Sentences dataset (5000+ hours), 7 Titan X and a few others on this. Ideally someone who can be in London. Pls pm me. Thanks.

songemeng commented 6 years ago

@smorrel1 Hi, could you tell me how did you get the Lip Reading Sentences dataset?

idigizen commented 5 years ago

@rizkiarm I am investigating lipnet on a dataset which I prepared of my own. how can I train it for variable size input, currently I have padded/repeated the last mouth frame to make all samples of equal length. also I have access to lip reading sentences in the wild dataset, but the format of data set is completely different from grid corpus. Do you have any script to to change its format?. Also I dont have any GPUs to train, how much time on an average will it take to train on 32GB, I-7 processor system to train on 400 videos of 115 mouth frames.

robinttt333 commented 4 years ago

Hi guys, I am new to this field and wanted to work on this dataset. Can you tell me how to align each frame to a label?

marziehoghbaie commented 3 years ago

Hi, I like to train the LipNet on a word-based dataset, but I have faced some issue. First I like to know what shoud I pad my real labels with? is it -1, for example it should be ['h', 'e', '-1' ,'-1']. Beside, I cant understand curriculam learning, does any body knows a good resource? Thanks