You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference

raphael10-collab commented 3 years ago

When trying to train the model with python ./jerex_train.py --config-path configs/docred_joint I get this message You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. What does it mean? How should I then train the model instead?

O.S: Ubuntu 18.04.4

/jerex$ python ./jerex_train.py --config-path configs/docred_joint
datasets:
  train_path: ./data/datasets/docred_joint/train_joint.json
  valid_path: ./data/datasets/docred_joint/dev_joint.json
  test_path: null
  types_path: ./data/datasets/docred_joint/types.json
model:
  model_type: joint_multi_instance
  encoder_path: bert-base-cased
  tokenizer_path: bert-base-cased
  mention_threshold: 0.85
  coref_threshold: 0.85
  rel_threshold: 0.6
  prop_drop: 0.1
  meta_embedding_size: 25
  size_embeddings_count: 30
  ed_embeddings_count: 300
  token_dist_embeddings_count: 700
  sentence_dist_embeddings_count: 50
  position_embeddings_count: 700
sampling:
  neg_mention_count: 200
  neg_coref_count: 200
  neg_relation_count: 200
  max_span_size: 10
  sampling_processes: 8
  neg_mention_overlap_ratio: 0.5
  lowercase: false
loss:
  mention_weight: 1.0
  coref_weight: 1.0
  entity_weight: 0.25
  relation_weight: 1.0
inference:
  valid_batch_size: 1
  test_batch_size: 1
  max_spans: null
  max_coref_pairs: null
  max_rel_pairs: null
training:
  batch_size: 1
  min_epochs: 20
  max_epochs: 20
  lr: 5.0e-05
  lr_warmup: 0.1
  weight_decay: 0.01
  max_grad_norm: 1.0
  accumulate_grad_batches: 1
  max_spans: null
  max_coref_pairs: null
  max_rel_pairs: null
distribution:
  gpus: []
  accelerator: ''
  prepare_data_per_node: false
misc:
  store_predictions: true
  store_examples: true
  flush_logs_every_n_steps: 1000
  log_every_n_steps: 1000
  deterministic: false
  seed: null
  cache_path: null
  precision: 32
  profiler: null
  final_valid_evaluate: true

Parse dataset '/home/marco/PyTorchMatters/EntitiesRelationsExtraction/jerex/data/datasets/docred_joint/train_joint.json': 100%|██████| 3008/3008 [00:41<00:00, 71.72it/s]
Parse dataset '/home/marco/PyTorchMatters/EntitiesRelationsExtraction/jerex/data/datasets/docred_joint/dev_joint.json': 100%|██████████| 300/300 [00:03<00:00, 75.22it/s]
Some weights of the model checkpoint at bert-base-cased were not used when initializing JointMultiInstanceModel: ['cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'bert.pooler.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'bert.pooler.dense.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing JointMultiInstanceModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing JointMultiInstanceModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of JointMultiInstanceModel were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['entity_classification.entity_classifier.weight', 'relation_classification.pair_linear.weight', 'coreference_resolution.coref_classifier.weight', 'relation_classification.rel_classifier.weight', 'mention_localization.size_embeddings.weight', 'relation_classification.rel_linear.weight', 'mention_localization.linear.weight', 'relation_classification.sentence_distance_embeddings.weight', 'relation_classification.token_distance_embeddings.weight', 'coreference_resolution.coref_linear.bias', 'coreference_resolution.coref_linear.weight', 'mention_localization.mention_classifier.weight', 'entity_classification.linear.bias', 'mention_localization.linear.bias', 'relation_classification.entity_type_embeddings.weight', 'entity_classification.linear.weight', 'relation_classification.rel_linear.bias', 'relation_classification.rel_classifier.bias', 'coreference_resolution.coref_ed_embeddings.weight', 'coreference_resolution.coref_classifier.bias', 'entity_classification.entity_classifier.bias', 'mention_localization.mention_classifier.bias', 'relation_classification.pair_linear.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type                    | Params
--------------------------------------------------
0 | model | JointMultiInstanceModel | 113 M 
--------------------------------------------------
113 M     Trainable params
0         Non-trainable params
113 M     Total params
455.954   Total estimated model params size (MB)

markus-eberts commented 3 years ago

Hi,

this is just a remark by the Huggingface library - no need to worry. We are using the BERT implementation of Huggingface internally. You are doing everything correctly here. When executing the train code (as you do), you train JEREX (and fine-tune into BERT) on a down-stream task (end-to-end relation extraction) and you can then use the model for prediction.

raphael10-collab commented 3 years ago

Thank you @markus-eberts .

Now I've got this memory issue: https://github.com/lavis-nlp/jerex/issues/3

montmejat commented 1 year ago

Does anyone know if there's a way to hide this message? :)

kantholtz commented 1 year ago

Hi, you should be able to suppress messages by decreasing the logging verbosity as described in their documentation: https://huggingface.co/docs/transformers/main_classes/logging

lavis-nlp / jerex

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference #2