unable to reconize any word but the loss is decreasing???

ethio-artifical commented 5 months ago

hello, i get an error on the training phase The loss is decreasing but when i evaluate the model it doesn't recognize any word i get 100 always. i install pytorch 1.13.0 python 3.10.13 ctcdecode-1.0.3

this is my log file

Sat Jan 27 01:36:04 2024 ] Parameters:

{'work_dir': 'PATH_TO_SAVE_RESULTS', 'config': './configs/baseline.yaml', 'random_fix': True, 'device': '0', 'phase': 'train', 'save_interval': 5, 'random_seed': 0, 'eval_interval': 1, 'print_log': True, 'log_interval': 50, 'evaluate_tool': 'python', 'feeder': 'dataset.dataloader_video.BaseFeeder', 'dataset': 'phoenix14', 'dataset_info': {'dataset_root': './dataset/phoenix2014/phoenix-2014-multisigner', 'dict_path': './preprocess/phoenix2014/gloss_dict.npy', 'evaluation_dir': './evaluation/slr_eval', 'evaluation_prefix': 'phoenix2014-groundtruth'}, 'num_worker': 10, 'feeder_args': {'mode': 'test', 'datatype': 'video', 'num_gloss': -1, 'drop_ratio': 1.0, 'prefix': './dataset/phoenix2014/phoenix-2014-multisigner', 'transform_mode': False}, 'model': 'slr_network.SLRModel', 'model_args': {'num_classes': 65, 'c2d_type': 'resnet18', 'conv_type': 2, 'use_bn': 1, 'share_classifier': False, 'weight_norm': False}, 'load_weights': None, 'load_checkpoints': None, 'decode_mode': 'beam', 'ignore_weights': [], 'batch_size': 8, 'test_batch_size': 8, 'loss_weights': {'SeqCTC': 1.0}, 'optimizer_args': {'optimizer': 'Adam', 'base_lr': 0.0001, 'step': [20, 35], 'learning_ratio': 1, 'weight_decay': 0.0001, 'start_epoch': 0, 'nesterov': False}, 'num_epoch': 20}

[ Sat Jan 27 01:36:31 2024 ] [ Sat Jan 27 01:38:26 2024 ] [ Sat Jan 27 01:40:25 2024 ] [ Sat Jan 27 01:41:07 2024 ] [ Sat Jan 27 01:41:58 [ Sat Jan 27 01:42:24 2024 ] [ Sat Jan 27 01:44:21 2024 ] [ Sat Jan 27 01:46:22 2024 ] [ Sat Jan 27 01:47:08 2024 ] [ Sat Jan 27 01:47:58 [ Sat Jan 27 01:48:27 2024 ] [ Sat Jan 27 01:50:20 2024 ] [ Sat Jan 27 01:52:13 2024 ] [ Sat Jan 27 01:53:01 2024 ] [ Sat Jan 27 01:53:52 [ Sat Jan 27 01:54:22 2024 ] [ Sat Jan 27 01:56:19 2024 ] [ Sat Jan 27 01:58:09 2024 ] [ Sat Jan 27 01:58:55 2024 ] [ Sat Jan 27 01:59:45 [ Sat Jan 27 02:00:12 2024 ] [ Sat Jan 27 02:02:05 2024 ] [ Sat Jan 27 02:04:04 2024 ] [ Sat Jan 27 02:04:47 2024 ] [ Sat Jan 27 02:05:38 [ Sat Jan 27 02:06:09 2024 ] [ Sat Jan 27 02:07:59 2024 ] [ Sat Jan 27 02:10:03 2024 ] [ Sat Jan 27 02:10:44 2024 ] [ Sat Jan 27 02:11:35 [ Sat Jan 27 02:12:00 2024 ] [ Sat Jan 27 02:14:02 2024 ] [ Sat Jan 27 02:16:07 2024 ] [ Sat Jan 27 02:16:45 2024 ] [ Sat Jan 27 02:17:36 [ Sat Jan 27 02:18:05 2024 ] [ Sat Jan 27 02:20:00 2024 ] [ Sat Jan 27 02:21:59 2024 ] [ Sat Jan 27 02:22:40 2024 ] [ Sat Jan 27 02:23:30 Epoch: 0, Batch(0/122) done. Loss: 110.28868103 lr:0.000100 Epoch: 0, Batch(50/122) done. Loss: 13.18387794 lr:0.000100 Epoch: 0, Batch(100/122) done. Loss: 12.18678570 lr:0.000100 Mean training loss: 18.2596124587. 2024 ] Dev WER: 100.00% Epoch: 1, Batch(0/122) done. Loss: 12.15300369 lr:0.000100 Epoch: 1, Batch(50/122) done. Loss: 11.67739010 lr:0.000100 Epoch: 1, Batch(100/122) done. Loss: 13.26895523 lr:0.000100 Mean training loss: 12.1612764968. 2024 ] Dev WER: 100.00% Epoch: 2, Batch(0/122) done. Loss: 12.09643936 lr:0.000100 Epoch: 2, Batch(50/122) done. Loss: 11.06025696 lr:0.000100 Epoch: 2, Batch(100/122) done. Loss: 9.84243107 lr:0.000100 Mean training loss: 10.5143460211. 2024 ] Dev WER: 100.00% Epoch: 3, Batch(0/122) done. Loss: 9.38849068 lr:0.000100 Epoch: 3, Batch(50/122) done. Loss: 9.07399940 lr:0.000100 Epoch: 3, Batch(100/122) done. Loss: 8.66645050 lr:0.000100 Mean training loss: 9.0431265127. 2024 ] Dev WER: 100.00% Epoch: 4, Batch(0/122) done. Loss: 8.63507748 lr:0.000100 Epoch: 4, Batch(50/122) done. Loss: 7.65232229 lr:0.000100 Epoch: 4, Batch(100/122) done. Loss: 7.27032137 lr:0.000100 Mean training loss: 7.6128989556. 2024 ] Dev WER: 100.00% Epoch: 5, Batch(0/122) done. Loss: 6.52053165 lr:0.000100 Epoch: 5, Batch(50/122) done. Loss: 4.85380507 lr:0.000100 Epoch: 5, Batch(100/122) done. Loss: 7.19156647 lr:0.000100 Mean training loss: 5.7774419706. 2024 ] Dev WER: 100.00% Epoch: 6, Batch(0/122) done. Loss: 3.87025928 lr:0.000100 Epoch: 6, Batch(50/122) done. Loss: 3.52518511 lr:0.000100 Epoch: 6, Batch(100/122) done. Loss: 3.84364915 lr:0.000100 Mean training loss: 3.9095683430. 2024 ] Dev WER: 100.00% Epoch: 7, Batch(0/122) done. Loss: 3.43237042 lr:0.000100 Epoch: 7, Batch(50/122) done. Loss: 2.54930735 lr:0.000100 Epoch: 7, Batch(100/122) done. Loss: 2.43364787 lr:0.000100 Mean training loss: 2.6058940282. 2024 ] Dev WER: 100.00%

WHAT IS THE PROBLEM GUYS and HOW TO SOLVE IT I AM TRYING IN ETHIOPIA SIGN LANGUAGE DATASET THAT IS AMHARIC CHARACTER

hulianyuyy commented 5 months ago

As the training process works well, this issue is probably due to the situation that the decoder, i.e., sclite, doesn't work during inference. You may check if you have created the link to sclite or set the right path.

ethio-artifical commented 5 months ago

Thank you for your response. I perform the evaluation using Python instead of sclite, and the dataset is in Amharic font. I run the code on Ubuntu 22.04 does these cause problem

hulianyuyy commented 5 months ago

I can't accurately locate the problem, but generally the issue is related with the decoder. This should not relate to the system version.

ethio-artifical commented 5 months ago

i run similar code in google Colab, and it works but when i run on my local environment, i get that the model loss is decreasing but when i evaluate it i get 100%

hulianyuyy / CorrNet

unable to reconize any word but the loss is decreasing??? #37