gulvarol / bsl1k

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues, ECCV 2020
https://www.robots.ox.ac.uk/~vgg/research/bsl1k/
76 stars 10 forks source link

The training procedure fails in phoenix2014.py #6

Open DorinK opened 3 years ago

DorinK commented 3 years ago

Dear authors,

I've been trying to use the code for the Phoenix dataset by running:

python main.py --bsl1k_mouthing_prob_thres 0.8 --checkpoint checkpoint/phoenix2014t_i3d_pkinetics --datasetname phoenix2014 --phoenix_path /home/nlp/dorink/project/bsl1k/data_phoenix --gpu_collation 0 --num-classes 1233 --num_figs 0 --pretrained misc/pretrained_models/model.pth.tar --schedule 5 10 12 --snapshot 1 --test-batch 1 --train-batch 1 --workers 0 --num_gpu 4

following your instructions in run

First of all, I think there may be an error in line 72 of datasets/phoenix2014.py: self.frame_level_glosses = data["videos"]["alignments"]["gloss_id"] because in the dictionary created by misc/phoenix2014T/gather_frames.py, there is no "alignment" property. So I'm currently using the following patch: self.frame_level_glosses = data["videos"]["gloss_ids"] Please update me if this should be something else.

My main problem is when I try to train the model using the command mentioned above, the code fails in the ‘_get_class’ function of datasets/phoenix2014.py, because the variable clip_glosses seems to be an empty list [] 98% of the time. The error:

File "/home/nlp/dorink/project/bsl1k/datasets/phoenix2014.py", line 124, in _get_class max_indices = np.where(cnts == cnts.max())[0] ValueError: zero-size array to reduction operation maximum which has no identity.

What can be done to solve this problem? Thanks in advance.

gulvarol commented 3 years ago

Thanks for reporting this. I have realized that I haven't done a long overdue update of the code about the phoenix part. The current version of the code does not really correspond to the released model training because phoenix2014T was trained with the CTC loss for which I have removed support to simplify the code. On the other hand, self.assign_labels == "auto" in the dataloader is only applicable to phoenix2014 (without T) for which automatic frame-level gloss alignments were provided, but I see that I didn't include the info.pkl file that has these alignments. I will need some time to check these edits and update the code in the next week or two. In the meantime, you could try by setting self.assign_labels="uniform" but this had worse performance.

DorinK commented 3 years ago

Gul Thanks for your reply, I would love for you to update me as soon as the code matching updates to phoenix2014T are done and in which code files the updates were made.

Also, I tried the alternative option you offered even before I opened the issue, but also in the evaluation process I've encountered incompatibilities in the evaluate.py file, which begin with the aggregate_clips function in the gt variable and are dragged along the entire evaluation process and prevent its completion.

For that matter, I used the following command for the evaluation - python main.py --checkpoint /home/nlp/dorink/project/bsl1k/checkpoint/phoenix2014/bug_fix/test_050 --datasetname phoenix2014 --num_gpus 4 -j 32 -e --evaluate_video 1 --pretrained /home/nlp/dorink/project/bsl1k/checkpoint/phoenix2014t_i3d_pkinetics_bug_fix/checkpoint_050.pth.tar --num-classes 1233 --num_in_frames 16 --save_features 1 --include_embds 1 --test_set test --phoenix_path /home/nlp/dorink/project/bsl1k/data_phoenix

I would be happy if you could also update the evaluation code accordingly.

In addition, I would be happy to receive your answer regarding the Technical question I asked above (my first comment). Is the trained model you provided necessary as a starting point for training? And in particular, is it suitable for phoneix2014T?

DorinK commented 3 years ago

I would love to get an update on whether the updated code is ready and will be pulled soon to the repo? If not yet, then I would love to know how much longer do you think it will take to complete these updates? Thanks in advance!

gulvarol commented 3 years ago

Sorry for the slow response. I clearly failed to update the code on time, so I would prefer not to make another estimate now. I will try to find some time for it. Find other answers below:

1) Please use evaluate_seq.py to run the evaluation on phoenix. 2) The released model for Phoenix2014T was trained with multiple stages: a - Training on Phoenix2014 with automatic labels (BSL-1K pretraining) [1296 classes] => 50 epochs: 53.7 WER; b - Finetuning on Phoenix2014T with uniform labels [1232 classes] => 50 epochs: 48.2 WER; c - Finetuning on Phoenix2014T with CTC loss freezing up to Mixed_5c I3D layers [1233 classes adding a background] => 6 epochs: 41.5 WER; d - Finetuning on Phoenix2014T with CTC loss unfreezing all layers [1233 classes] => 4 epochs: 39.5 WER.

Step a was pretrained on this model or equivalently by setting --pretrained misc/pretrained_models/bsl1k.pth.tar. So I would suggest putting this for training from scratch. The link you have asked corresponds to the controlled experiment of following steps a through d, but instead with Kinetics pretraining in step a.

Steps c and d are heavy. I'd like to check whether I can train one model with a single step so that it's simpler. If it helps: the result of training with a single stage only on Phoenix2014T uniform labels (without CTC, without Phoenix2014 pretraining, with BSL-1K pretraining) was 53.7 WER.

rabeya-akter commented 1 year ago

Sorry for the slow response. I clearly failed to update the code on time, so I would prefer not to make another estimate now. I will try to find some time for it. Find other answers below:

1. Please use `evaluate_seq.py` to run the evaluation on phoenix.

2. The released model for Phoenix2014T was trained with multiple stages:
   a - Training on Phoenix2014 with automatic labels (BSL-1K pretraining) [1296 classes] => 50 epochs: 53.7 WER;
   b - Finetuning on Phoenix2014T with uniform labels [1232 classes] => 50 epochs: 48.2 WER;
   c - Finetuning on Phoenix2014T with CTC loss freezing up to Mixed_5c I3D layers [1233 classes adding a background] => 6 epochs: 41.5 WER;
   d - Finetuning on Phoenix2014T with CTC loss unfreezing all layers [1233 classes] => 4 epochs: 39.5 WER.

Step a was pretrained on this model or equivalently by setting --pretrained misc/pretrained_models/bsl1k.pth.tar. So I would suggest putting this for training from scratch. The link you have asked corresponds to the controlled experiment of following steps a through d, but instead with Kinetics pretraining in step a.

Steps c and d are heavy. I'd like to check whether I can train one model with a single step so that it's simpler. If it helps: the result of training with a single stage only on Phoenix2014T uniform labels (without CTC, without Phoenix2014 pretraining, with BSL-1K pretraining) was 53.7 WER.

Can you give the pretrained model for step d? I want to extract the i3d feature using that.

gulvarol commented 1 year ago

Hi, sorry I am not at capacity to provide support too much. But from what I read above, the released model is already the result of step d.