nyu-mll / jiant

jiant is an nlp toolkit
https://jiant.info
MIT License
1.64k stars 297 forks source link

Unable to execute run_simple() with different models of the same type #1348

Open TimDettmers opened 2 years ago

TimDettmers commented 2 years ago

Describe the bug

When one uses run_simple() with different models of the same type roberta-base and roberta-large the run crashes because the code assumes they are the same model because weights are saved under hf_config.model_type (instead of args.hf_pretrained_model_name_or_path.). As such, the code tries to load incompatible weights and crashes.

To Reproduce

  1. Install jiant
  2. Run the simple example in README
  3. Change the model in the sample from 'roberta-basetoroberta-large`

Expected behavior One should be able to run run_simple() with different models of the same type.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Solution: The hf_config.model_type should be used for caching tokenizer / tasks. The args.hf_pretrained_model_name_or_path for the weights.