All scores are null after training on own corpus

AtillaKaanAlkan commented 11 months ago

Hi @shon-otmazgin ,

I fine-tuned lingmess on my own corpus on 50 epochs. On the last epoch I print the results, which are the following: As you can see all scores are null (even if I give the same training, dev and test sets to the system they remain null). I don't understand what is going wrong? Thanks for helping! Atilla

Epoch: 100%|██████████| 50/50 [09:00<00:00, 10.81s/it]/it]
08/02/2023 01:47:28 - INFO - __main__ -   global_step = 500, average loss = nan
08/02/2023 01:47:28 - INFO - eval -   ***** Running Inference on dev split  *****
08/02/2023 01:47:28 - INFO - eval -     Examples number: 59
Inference: 100%|██████████| 59/59 [00:07<00:00,  7.98it/s]
08/02/2023 01:47:36 - INFO - util -   Predicted clusters at: lingmess-tdac/train_infered.output.jsonlines
08/02/2023 01:47:36 - INFO - util -   ***** Eval results  *****
08/02/2023 01:47:36 - INFO - util -     eval_loss                      = nan
08/02/2023 01:47:36 - INFO - util -     post pruning mention precision = 0.000
08/02/2023 01:47:36 - INFO - util -     post pruning mention recall    = 0.000
08/02/2023 01:47:36 - INFO - util -     post pruning mention f1        = 0.000
08/02/2023 01:47:36 - INFO - util -     mention precision              = 0.000
08/02/2023 01:47:36 - INFO - util -     mention recall                 = 0.000
08/02/2023 01:47:36 - INFO - util -     mention f1                     = 0.000
08/02/2023 01:47:36 - INFO - util -     precision                      = 0.000
08/02/2023 01:47:36 - INFO - util -     recall                         = 0.000
08/02/2023 01:47:36 - INFO - util -     f1                             = 0.000
08/02/2023 01:47:36 - INFO - util -     pron-pron-comp                 = {'true_pairs': 0, 'false_pairs': 0, 'precision': 0, 'recall': 0, 'f1': 0}
08/02/2023 01:47:36 - INFO - util -     pron-pron-no-comp              = {'true_pairs': 0, 'false_pairs': 0, 'precision': 0, 'recall': 0, 'f1': 0}
08/02/2023 01:47:36 - INFO - util -     pron-ent                       = {'true_pairs': 0, 'false_pairs': 0, 'precision': 0, 'recall': 0, 'f1': 0}
08/02/2023 01:47:36 - INFO - util -     match                          = {'true_pairs': 0, 'false_pairs': 0, 'precision': 0, 'recall': 0, 'f1': 0}
08/02/2023 01:47:36 - INFO - util -     contain                        = {'true_pairs': 0, 'false_pairs': 0, 'precision': 0, 'recall': 0, 'f1': 0}
08/02/2023 01:47:36 - INFO - util -     other                          = {'true_pairs': 0, 'false_pairs': 0, 'precision': 0, 'recall': 0, 'f1': 0}
08/02/2023 01:47:36 - INFO - root -   {'eval_loss': nan, 'post pruning mention precision': 0.0, 'post pruning mention recall': 0.0, 'post pruning mention f1': 0.0, 'mention precision': 0.0, 'mention recall': 0.0, 'mention f1': 0.0, 'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'pron-pron-comp': {'true_pairs': 0, 'false_pairs': 0, 'precision': 0, 'recall': 0, 'f1': 0}, 'pron-pron-no-comp': {'true_pairs': 0, 'false_pairs': 0, 'precision': 0, 'recall': 0, 'f1': 0}, 'pron-ent': {'true_pairs': 0, 'false_pairs': 0, 'precision': 0, 'recall': 0, 'f1': 0}, 'match': {'true_pairs': 0, 'false_pairs': 0, 'precision': 0, 'recall': 0, 'f1': 0}, 'contain': {'true_pairs': 0, 'false_pairs': 0, 'precision': 0, 'recall': 0, 'f1': 0}, 'other': {'true_pairs': 0, 'false_pairs': 0, 'precision': 0, 'recall': 0, 'f1': 0}}
wandb: Waiting for W&B process to finish... (success).
wandb: 
wandb: Run summary:
wandb: loss nan
wandb: 
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /mnt/beegfs/projects/NLP-astrophysics/lingmess-coref/wandb/offline-run-20230802_013805-3bn679is
wandb: Find logs at: ./wandb/offline-run-20230802_013805-3bn679is/logs

shon-otmazgin commented 11 months ago

I can see that your loss is nan so I assume this is the reason.. can you debug to see why? which encoder did you use?

AtillaKaanAlkan commented 11 months ago

Hi @shon-otmazgin ,

First of all, thanks for your answer!

I have debug and found the reason of the problem by reducing the eval_steps parameter value. Its default value was too high, maybe because my corpus is small... So, after reducing it to 100 it works.

By the way, I tried to fine-tune a model with some other hyperparameters: e.g. when I am reducing the size of max_tokens_in_batch (e.g. to 2500 or other value) I receive an error message. For the moment, I am obliged to keep it at its default value (5000).

I will try to figure it out this in the next days, I have to work on other projects at the moment. I can write you if I am not able to solve it :-)

Best, Atilla

shon-otmazgin / lingmess-coref

All scores are null after training on own corpus #4