Unable to run RAG model for tokenizing, fine-tuning, and predicting with `jiant` on branch `IRT_experiments`

pk1130 commented 3 years ago

Describe the bug

Hey @sleepinyourhat @zphang @jeswan @HaokunLiu! I noticed that you guys worked on adding new models to the JiantTransformersModel so tagged you here :) I was trying to run a RAG model for fine-tuning on the MrQA-NQ dataset using jiant but this does not seem to be supported. It throws a KeyError: rag when I run the following command:

source jiant/irt_scripts/run_train_task.sh
run_train_task facebook/rag-token-base mrqa_natural_questions 1

where run_train_task.sh is a shell script that I wrote to run exactly the same commands as run_train_task.sbatch without using sbatch.

To Reproduce

Tell use which version of jiant you're using - I've git cloned the repo as it is, but since I'm running an IRT experiment, I'm using the irt_scripts/ directory in tandem with the jiant/ directory on the branch IRT_experiments
Describe the environment where you're using jiant, e.g, "2 P40 GPUs" - I'm using jiant in Google Colab along with Google Drive.

Expected behavior I expected the RAG model to start finetuning and generate the cache files in the experiments/cache/ directory as outlined in the README.

Screenshots

Additional context On investigating further, I realized that IRT_experiments was still using transformers==3.1.0 which does not support rag architectures. I uninstalled that version of transformers and tried upgrading to transformers>=3.5.0 to see if that would fix the issue. But that resulted in a new issue saying ModuleNotFoundError: No module named 'transformers.tokenization_bert'. Looks like transformers refactored their code in later versions while incorporating different models. If the IRT_experiments branch was up to date with the changes on the master branch, would that fix things? Because I noticed that jiant on master was using transformers==4.5.0. Is there any other way that I can use a RAG model along with the scripts in irt_scripts/ for my IRT research? Specifically, I need to use the fine-tuning, predicting and post-processing scripts which are available in the IRT_experiments branch in the irt_scripts/ directory. Please respond at your earliest convenience! Thanks!

pk1130 commented 3 years ago

Even further investigation has led me to understand that in order to carry out my research with the existing irt_scripts/ framework and the latest version of jiant, I would have to port that directory to the master branch on my local machine and add the RAG model to the ModelArchitectures and TOKENIZER_DICT in jiant/proj/main/modeling/primary.py as outlined here: https://github.com/nyu-mll/jiant/blob/51e9be2a8ed8589e884ea927e348df8342c40fcf/guides/models/adding_models.md

I'm having trouble understanding how to implement the normalize_tokenizations(), get_mlm_weights_dict(), and get_feat_spec() functions in the subclass created for the RAG model. Any suggestions or advice on how to move forward @sleepinyourhat @zphang @jeswan @HaokunLiu? Thanks a lot!

zphang commented 3 years ago

Hi, sorry for the delay in my response.

normalize_tokenizations has to do with aligning token spans between raw text and the model tokenizer's tokens. Depending on which tokenizer you're using, you might be able to piggyback off an existing implementation.
get_mlm_weights_dict gets the weights for the MLM head from the pretrained model. In contrast to standard NLU tasks, which use a new classifier head, an MLM-task ought to reuse the MLM head from pretraining. Conversely, if you are not using an MLM task, this should not impact you.
get_feat_spec is a somewhat older abstraction for describing different tokenizer setups, e.g. padding IDs. Like with normalize_tokenizations, you might be able to piggyback off an existing implementation if you are using a similar tokenizer.

nyu-mll / jiant

Unable to run RAG model for tokenizing, fine-tuning, and predicting with `jiant` on branch `IRT_experiments` #1331