Code and analysis for minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models
Replicate the minicons
environment using the following code:
conda env create -r environment.yml
# deactivate the current active environment, if any, and then:
conda activate minicons
The paper includes two motivating behavioral analysis of transformer language models that we conduct using minicons
. Each analysis is based around a particular dataset:
Paper: BLiMP: The benchmark of linguistic minimal pairs for English
Authors: Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Mohananey, Wei Peng, Sheng-Fu Wang, Samuel R. Bowman
Source URL: Github
Location in this repo: data/blimp
(contains 67 jsonl
files, each targeting a specific type of linguistic phenomena.)
Goal of the analysis: Evaluate LMs by assessing their preference for linguistically acceptable vs. unacceptable sentences differing by a single word.
Paper: Abductive Commonsense Reasoning
Authors: Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi
Source URL: Project Page, Leaderboard
Location in this repo: data/anli
(contains 3 jsonl
files)
Goal of the analysis: Evaluate LMs by assessing their capacity to choose the most plausible explanation given two observations.
Goal: Shed light on the capacity of the BERT-base model to make linguistic acceptability judgments as it is pre-trained.
Follow the instructions listed here to download and use the MultiBERTs model checkpoints. These are checkpoints of 5 different BERT-base models pre-trained on the same corpus using different random seeds. For each of the 5 runs, the authors also make available checkpoints at various time-steps during the course of pre-training.
After this step, the multiberts
directory should have 5 new directories with the format seed{0,1,2,3,4}
, each of which with the following structure:
seed{n}
├── step_{m}
│ ├── bert.ckpt.index
│ ├── bert.ckpt.meta
│ ├── checkpoint
│ ├── config.json
│ ├── pytorch_model.bin
│ ├── special_tokens_map.json
│ ├── tokenizer_config.json
│ └── vocab.txt
Next, download the data from here and store it in data/blimp
.
Run the following command to track and save the learning dynamics of the MultiBERTs:
python src/blimp.py --device 0 --batchsize 64 --workers 16
where --device
is the cuda device (-1 for CPU), --batchsize
is ..the batch size., and --workers
is the number of workers used by the torch.utils.DataLoader
.
This will create a file called blimp_multiberts_results.csv
in the data/results
directory with the following columnar format:
instance_id,field,topic,phenomena,seed,step,good,bad
where good
and bad
stand for log-probabilities for the grammatical and ungrammmatical sentences for each instance in a given blimp phenomenon.
TODO
If you use minicons
or the code in this repository, please cite the following paper:
@article{misra2022minicons,
title={minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models},
author={Kanishka Misra},
journal={arXiv preprint arXiv:2203.13112},
year={2022}
}