Open pk1130 opened 3 years ago
Even further investigation has led me to understand that in order to carry out my research with the existing irt_scripts/
framework and the latest version of jiant, I would have to port that directory to the master branch on my local machine and add the RAG model to the ModelArchitectures and TOKENIZER_DICT in jiant/proj/main/modeling/primary.py
as outlined here:
https://github.com/nyu-mll/jiant/blob/51e9be2a8ed8589e884ea927e348df8342c40fcf/guides/models/adding_models.md
I'm having trouble understanding how to implement the normalize_tokenizations()
, get_mlm_weights_dict()
, and get_feat_spec()
functions in the subclass created for the RAG model. Any suggestions or advice on how to move forward @sleepinyourhat @zphang @jeswan @HaokunLiu? Thanks a lot!
Hi, sorry for the delay in my response.
normalize_tokenizations
has to do with aligning token spans between raw text and the model tokenizer's tokens. Depending on which tokenizer you're using, you might be able to piggyback off an existing implementation.get_mlm_weights_dict
gets the weights for the MLM head from the pretrained model. In contrast to standard NLU tasks, which use a new classifier head, an MLM-task ought to reuse the MLM head from pretraining. Conversely, if you are not using an MLM task, this should not impact you.get_feat_spec
is a somewhat older abstraction for describing different tokenizer setups, e.g. padding IDs. Like with normalize_tokenizations
, you might be able to piggyback off an existing implementation if you are using a similar tokenizer.
Describe the bug
Hey @sleepinyourhat @zphang @jeswan @HaokunLiu! I noticed that you guys worked on adding new models to the
JiantTransformersModel
so tagged you here :) I was trying to run a RAG model for fine-tuning on the MrQA-NQ dataset using jiant but this does not seem to be supported. It throws aKeyError: rag
when I run the following command:where
run_train_task.sh
is a shell script that I wrote to run exactly the same commands asrun_train_task.sbatch
without usingsbatch
.To Reproduce
jiant
you're using - I've git cloned the repo as it is, but since I'm running an IRT experiment, I'm using theirt_scripts/
directory in tandem with thejiant/
directory on the branchIRT_experiments
jiant
, e.g, "2 P40 GPUs" - I'm usingjiant
in Google Colab along with Google Drive.Expected behavior I expected the RAG model to start finetuning and generate the
cache
files in theexperiments/cache/
directory as outlined in the README.Screenshots
Additional context On investigating further, I realized that
IRT_experiments
was still usingtransformers==3.1.0
which does not supportrag
architectures. I uninstalled that version of transformers and tried upgrading totransformers>=3.5.0
to see if that would fix the issue. But that resulted in a new issue sayingModuleNotFoundError: No module named 'transformers.tokenization_bert'
. Looks like transformers refactored their code in later versions while incorporating different models. If theIRT_experiments
branch was up to date with the changes on themaster
branch, would that fix things? Because I noticed thatjiant
onmaster
was usingtransformers==4.5.0
. Is there any other way that I can use a RAG model along with the scripts inirt_scripts/
for my IRT research? Specifically, I need to use the fine-tuning, predicting and post-processing scripts which are available in theIRT_experiments
branch in theirt_scripts/
directory. Please respond at your earliest convenience! Thanks!