nyu-mll / jiant

jiant is an nlp toolkit
https://jiant.info
MIT License
1.64k stars 297 forks source link

Unable to run RAG model for tokenizing, fine-tuning, and predicting with `jiant` on branch `IRT_experiments` #1331

Open pk1130 opened 3 years ago

pk1130 commented 3 years ago

Describe the bug

Hey @sleepinyourhat @zphang @jeswan @HaokunLiu! I noticed that you guys worked on adding new models to the JiantTransformersModel so tagged you here :) I was trying to run a RAG model for fine-tuning on the MrQA-NQ dataset using jiant but this does not seem to be supported. It throws a KeyError: rag when I run the following command:

source jiant/irt_scripts/run_train_task.sh
run_train_task facebook/rag-token-base mrqa_natural_questions 1

where run_train_task.sh is a shell script that I wrote to run exactly the same commands as run_train_task.sbatch without using sbatch.

To Reproduce

  1. Tell use which version of jiant you're using - I've git cloned the repo as it is, but since I'm running an IRT experiment, I'm using the irt_scripts/ directory in tandem with the jiant/ directory on the branch IRT_experiments
  2. Describe the environment where you're using jiant, e.g, "2 P40 GPUs" - I'm using jiant in Google Colab along with Google Drive.

Expected behavior I expected the RAG model to start finetuning and generate the cache files in the experiments/cache/ directory as outlined in the README.

Screenshots image

Additional context On investigating further, I realized that IRT_experiments was still using transformers==3.1.0 which does not support rag architectures. I uninstalled that version of transformers and tried upgrading to transformers>=3.5.0 to see if that would fix the issue. But that resulted in a new issue saying ModuleNotFoundError: No module named 'transformers.tokenization_bert'. Looks like transformers refactored their code in later versions while incorporating different models. If the IRT_experiments branch was up to date with the changes on the master branch, would that fix things? Because I noticed that jiant on master was using transformers==4.5.0. Is there any other way that I can use a RAG model along with the scripts in irt_scripts/ for my IRT research? Specifically, I need to use the fine-tuning, predicting and post-processing scripts which are available in the IRT_experiments branch in the irt_scripts/ directory. Please respond at your earliest convenience! Thanks!

pk1130 commented 3 years ago

Even further investigation has led me to understand that in order to carry out my research with the existing irt_scripts/ framework and the latest version of jiant, I would have to port that directory to the master branch on my local machine and add the RAG model to the ModelArchitectures and TOKENIZER_DICT in jiant/proj/main/modeling/primary.py as outlined here: https://github.com/nyu-mll/jiant/blob/51e9be2a8ed8589e884ea927e348df8342c40fcf/guides/models/adding_models.md

I'm having trouble understanding how to implement the normalize_tokenizations(), get_mlm_weights_dict(), and get_feat_spec() functions in the subclass created for the RAG model. Any suggestions or advice on how to move forward @sleepinyourhat @zphang @jeswan @HaokunLiu? Thanks a lot!

zphang commented 3 years ago

Hi, sorry for the delay in my response.