hulianyuyy / CorrNet

Continuous Sign Language Recognition with Correlation Network (CVPR 2023)
84 stars 14 forks source link

unable to reconize any word but the loss is decreasing??? #37

Closed ethio-artifical closed 2 months ago

ethio-artifical commented 5 months ago

hello, i get an error on the training phase The loss is decreasing but when i evaluate the model it doesn't recognize any word i get 100 always. i install pytorch 1.13.0 python 3.10.13 ctcdecode-1.0.3

this is my log file

Sat Jan 27 01:36:04 2024 ] Parameters:

{'work_dir': 'PATH_TO_SAVE_RESULTS', 'config': './configs/baseline.yaml', 'random_fix': True, 'device': '0', 'phase': 'train', 'save_interval': 5, 'random_seed': 0, 'eval_interval': 1, 'print_log': True, 'log_interval': 50, 'evaluate_tool': 'python', 'feeder': 'dataset.dataloader_video.BaseFeeder', 'dataset': 'phoenix14', 'dataset_info': {'dataset_root': './dataset/phoenix2014/phoenix-2014-multisigner', 'dict_path': './preprocess/phoenix2014/gloss_dict.npy', 'evaluation_dir': './evaluation/slr_eval', 'evaluation_prefix': 'phoenix2014-groundtruth'}, 'num_worker': 10, 'feeder_args': {'mode': 'test', 'datatype': 'video', 'num_gloss': -1, 'drop_ratio': 1.0, 'prefix': './dataset/phoenix2014/phoenix-2014-multisigner', 'transform_mode': False}, 'model': 'slr_network.SLRModel', 'model_args': {'num_classes': 65, 'c2d_type': 'resnet18', 'conv_type': 2, 'use_bn': 1, 'share_classifier': False, 'weight_norm': False}, 'load_weights': None, 'load_checkpoints': None, 'decode_mode': 'beam', 'ignore_weights': [], 'batch_size': 8, 'test_batch_size': 8, 'loss_weights': {'SeqCTC': 1.0}, 'optimizer_args': {'optimizer': 'Adam', 'base_lr': 0.0001, 'step': [20, 35], 'learning_ratio': 1, 'weight_decay': 0.0001, 'start_epoch': 0, 'nesterov': False}, 'num_epoch': 20}

[ Sat Jan 27 01:36:31 2024 ] Epoch: 0, Batch(0/122) done. Loss: 110.28868103 lr:0.000100 [ Sat Jan 27 01:38:26 2024 ] Epoch: 0, Batch(50/122) done. Loss: 13.18387794 lr:0.000100 [ Sat Jan 27 01:40:25 2024 ] Epoch: 0, Batch(100/122) done. Loss: 12.18678570 lr:0.000100 [ Sat Jan 27 01:41:07 2024 ] Mean training loss: 18.2596124587. [ Sat Jan 27 01:41:58 2024 ] Dev WER: 100.00% [ Sat Jan 27 01:42:24 2024 ] Epoch: 1, Batch(0/122) done. Loss: 12.15300369 lr:0.000100 [ Sat Jan 27 01:44:21 2024 ] Epoch: 1, Batch(50/122) done. Loss: 11.67739010 lr:0.000100 [ Sat Jan 27 01:46:22 2024 ] Epoch: 1, Batch(100/122) done. Loss: 13.26895523 lr:0.000100 [ Sat Jan 27 01:47:08 2024 ] Mean training loss: 12.1612764968. [ Sat Jan 27 01:47:58 2024 ] Dev WER: 100.00% [ Sat Jan 27 01:48:27 2024 ] Epoch: 2, Batch(0/122) done. Loss: 12.09643936 lr:0.000100 [ Sat Jan 27 01:50:20 2024 ] Epoch: 2, Batch(50/122) done. Loss: 11.06025696 lr:0.000100 [ Sat Jan 27 01:52:13 2024 ] Epoch: 2, Batch(100/122) done. Loss: 9.84243107 lr:0.000100 [ Sat Jan 27 01:53:01 2024 ] Mean training loss: 10.5143460211. [ Sat Jan 27 01:53:52 2024 ] Dev WER: 100.00% [ Sat Jan 27 01:54:22 2024 ] Epoch: 3, Batch(0/122) done. Loss: 9.38849068 lr:0.000100 [ Sat Jan 27 01:56:19 2024 ] Epoch: 3, Batch(50/122) done. Loss: 9.07399940 lr:0.000100 [ Sat Jan 27 01:58:09 2024 ] Epoch: 3, Batch(100/122) done. Loss: 8.66645050 lr:0.000100 [ Sat Jan 27 01:58:55 2024 ] Mean training loss: 9.0431265127. [ Sat Jan 27 01:59:45 2024 ] Dev WER: 100.00% [ Sat Jan 27 02:00:12 2024 ] Epoch: 4, Batch(0/122) done. Loss: 8.63507748 lr:0.000100 [ Sat Jan 27 02:02:05 2024 ] Epoch: 4, Batch(50/122) done. Loss: 7.65232229 lr:0.000100 [ Sat Jan 27 02:04:04 2024 ] Epoch: 4, Batch(100/122) done. Loss: 7.27032137 lr:0.000100 [ Sat Jan 27 02:04:47 2024 ] Mean training loss: 7.6128989556. [ Sat Jan 27 02:05:38 2024 ] Dev WER: 100.00% [ Sat Jan 27 02:06:09 2024 ] Epoch: 5, Batch(0/122) done. Loss: 6.52053165 lr:0.000100 [ Sat Jan 27 02:07:59 2024 ] Epoch: 5, Batch(50/122) done. Loss: 4.85380507 lr:0.000100 [ Sat Jan 27 02:10:03 2024 ] Epoch: 5, Batch(100/122) done. Loss: 7.19156647 lr:0.000100 [ Sat Jan 27 02:10:44 2024 ] Mean training loss: 5.7774419706. [ Sat Jan 27 02:11:35 2024 ] Dev WER: 100.00% [ Sat Jan 27 02:12:00 2024 ] Epoch: 6, Batch(0/122) done. Loss: 3.87025928 lr:0.000100 [ Sat Jan 27 02:14:02 2024 ] Epoch: 6, Batch(50/122) done. Loss: 3.52518511 lr:0.000100 [ Sat Jan 27 02:16:07 2024 ] Epoch: 6, Batch(100/122) done. Loss: 3.84364915 lr:0.000100 [ Sat Jan 27 02:16:45 2024 ] Mean training loss: 3.9095683430. [ Sat Jan 27 02:17:36 2024 ] Dev WER: 100.00% [ Sat Jan 27 02:18:05 2024 ] Epoch: 7, Batch(0/122) done. Loss: 3.43237042 lr:0.000100 [ Sat Jan 27 02:20:00 2024 ] Epoch: 7, Batch(50/122) done. Loss: 2.54930735 lr:0.000100 [ Sat Jan 27 02:21:59 2024 ] Epoch: 7, Batch(100/122) done. Loss: 2.43364787 lr:0.000100 [ Sat Jan 27 02:22:40 2024 ] Mean training loss: 2.6058940282. [ Sat Jan 27 02:23:30 2024 ] Dev WER: 100.00%

WHAT IS THE PROBLEM GUYS and HOW TO SOLVE IT I AM TRYING IN ETHIOPIA SIGN LANGUAGE DATASET THAT IS AMHARIC CHARACTER

hulianyuyy commented 5 months ago

As the training process works well, this issue is probably due to the situation that the decoder, i.e., sclite, doesn't work during inference. You may check if you have created the link to sclite or set the right path.

ethio-artifical commented 5 months ago

Thank you for your response. I perform the evaluation using Python instead of sclite, and the dataset is in Amharic font. I run the code on Ubuntu 22.04 does these cause problem

hulianyuyy commented 5 months ago

I can't accurately locate the problem, but generally the issue is related with the decoder. This should not relate to the system version.

ethio-artifical commented 5 months ago

i run similar code in google Colab, and it works but when i run on my local environment, i get that the model loss is decreasing but when i evaluate it i get 100%