lavis-nlp / jerex

PyTorch code for JEREX: Joint Entity-Level Relation Extractor
MIT License
61 stars 15 forks source link

Errors when performing tests #13

Closed zozni closed 2 years ago

zozni commented 2 years ago

Testing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 700/700 [05:24<00:00, 2.27it/s]Evaluation

--- Entity Mentions ---

Traceback (most recent call last): File "./jerex_test.py", line 20, in test model.test(cfg) File "/home/jhj/jerex/jerex/model.py", line 389, in test trainer.test(model, datamodule=data_module) File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 910, in test results = self.test_given_model(model, test_dataloaders) File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 970, in test_given_model results = self.fit(model) File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit self.dispatch() File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 540, in dispatch self.accelerator.start_testing(self) File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 76, in start_testing self.training_type_plugin.start_testing(trainer) File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 118, in start_testing self._results = trainer.run_test() File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 786, in run_test eval_loopresults, = self.run_evaluation() File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in run_evaluation deprecated_eval_results = self.evaluation_loop.evaluation_epoch_end() File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 189, in evaluation_epoch_end deprecated_results = self.run_eval_epoch_end(self.num_dataloaders) File "/home/jhj/anaconda3/envs/New_Env/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 221, in run_eval_epoch_end eval_results = model.test_epoch_end(eval_results) File "/home/jhj/jerex/jerex/model.py", line 155, in test_epoch_end metrics = self._evaluator.compute_metrics(self._eval_test_gt, predictions) File "/home/jhj/jerex/jerex/evaluation/joint_evaluator.py", line 76, in compute_metrics mention_eval = scoring.score(gt_mentions, pred_mentions, print_results=True) File "/home/jhj/jerex/jerex/evaluation/scoring.py", line 55, in score metrics = _compute_metrics(gt_flat, pred_flat, labels, labels_str, print_results) File "/home/jhj/jerex/jerex/evaluation/scoring.py", line 64, in _compute_metrics per_type = prfs(gt_all, pred_all, labels=labels, average=None, zero_division=0) TypeError: precision_recall_fscore_support() got an unexpected keyword argument 'zero_division'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. Testing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 700/700 [05:24<00:00, 2.16it/s]

Hi. When testing, an error like that occurs and the result value is not saved. Any ideas?

thanks

markus-eberts commented 2 years ago

Hi, which scikit-learn version are you using?

zozni commented 2 years ago

The scikit-learn version was 0.21.3. After reinstalling the environment, the above issue was resolved. thanks for support.

However, another problem arises: what's the reason?

home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/sklearn/utils/validation.py:179: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.0 [00:45<03:03, 1.16it/s] if LooseVersion(joblib_version) < '0.12': Epoch 0: 94%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 3098/3308 [04:58<00:20, 10.38it/s, loss=0.445, v_num=0_0]Traceback (most recent call last):██████████████████████████████▏ | 89/300 [00:45<02:46, 1.27it/s] File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 637, in run_train self.train_loop.run_training_epoch() File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 577, in run_training_epoch self.trainer.run_evaluation(on_epoch=True) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 725, in run_evaluation output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 166, in evaluation_step output = self.trainer.accelerator.validation_step(args) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 177, in validation_step return self.training_type_plugin.validation_step(args) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 131, in validation_step return self.lightning_module.validation_step(args, kwargs) File "/home/jhj/JEREX/jerex/model.py", line 126, in validation_step return self._inference(batch, batch_idx) File "/home/jhj/JEREX/jerex/model.py", line 176, in _inference output = self(batch, inference=True) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/home/jhj/JEREX/jerex/model.py", line 106, in forward max_rel_pairs=max_rel_pairs, inference=inference) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/jhj/JEREX/jerex/models/joint_models.py", line 144, in forward return self._forward_inference(args, kwargs) File "/home/jhj/JEREX/jerex/models/joint_models.py", line 209, in _forward_inference mention_sample_masks, max_spans=max_spans, max_coref_pairs=max_coref_pairs) File "/home/jhj/JEREX/jerex/models/joint_models.py", line 81, in _forward_inference_common mention_reprs = self.mention_representation(h, mention_masks, max_spans=max_spans) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/jhj/JEREX/jerex/models/modules/mention_representation.py", line 20, in forward chunk_mention_reprs = self._forward(chunk_mention_masks, chunk_h) File "/home/jhj/JEREX/jerex/models/modules/mention_representation.py", line 28, in _forward mention_reprs = m + h RuntimeError: CUDA out of memory. Tried to allocate 6.16 GiB (GPU 0; 7.77 GiB total capacity; 1.73 GiB already allocated; 4.80 GiB free; 1.92 GiB reserved in total by PyTorch)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./jerex_train.py", line 24, in train() File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/main.py", line 37, in decorated_main strict=strict, File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/utils.py", line 347, in _run_hydra lambda: hydra.run( File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/utils.py", line 201, in run_and_report raise ex File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/utils.py", line 198, in run_and_report return func() File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/utils.py", line 350, in overrides=args.overrides, File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 112, in run configure_logging=with_log_configuration, File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/hydra/core/utils.py", line 127, in run_job ret.return_value = task_function(task_cfg) File "./jerex_train.py", line 20, in train model.train(cfg) File "/home/jhj/JEREX/jerex/model.py", line 341, in train trainer.fit(model, datamodule=data_module) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit self.dispatch() File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch self.accelerator.start_training(self) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training self.training_type_plugin.start_training(trainer) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training self._results = trainer.run_train() File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 669, in run_train self.train_loop.on_train_end() File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 134, in on_train_end self.check_checkpoint_callback(should_update=True, is_last=True) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 164, in check_checkpoint_callback cb.on_validation_end(self.trainer, model) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 212, in on_validation_end self.save_checkpoint(trainer, pl_module) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 247, in save_checkpoint self._validate_monitor_key(trainer) File "/home/jhj/anaconda3/envs/jerex/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 495, in _validate_monitor_key raise MisconfigurationException(m) pytorch_lightning.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='valid_f1') not found in the returned metrics: ['train_mention_loss', 'train_coref_loss', 'train_entity_loss', 'train_rel_loss', 'train_loss']. HINT: Did you call self.log('valid_f1', value) in the LightningModule?

zozni commented 2 years ago

This was also a scikit-learn version issue.... I upgraded the version to 0.23.2 and it was solved.