Columns for critical region

sathvikn commented 1 year ago

Lan et al compute their effect size by taking the difference in surprisals over a critical region of grammatical and ungrammatical sentences with & without fillers, and then seeing if the difference of Delta-filler and Delta+filler is greater than zero. It might be sufficient to use the last word for the replication here, but specifying the indices of the start & end of the critical regions could be useful for other syntactic structures we end up testing later.

Acceptance Criteria: [] two columns in the generated CSVs indicating the start & end of the region where we compute surprisal.

Possible implementation: add a field to the config files that specify the nodes of the critical region? It may be simple enough to hard-code the start & end indices in the config file and then copy them over to the CSV.
GPT2 tokenization might make this hard, so we can just focus on replicating their work w/GRNN.

rmhopkins4 commented 1 year ago

I've implemented it so that in the config file we can insert "/" around the text that represents the critical region. I opted for this more manual approach over an approach like defining which nodes are part of the critical region since it is more extensible and we may want to reuse nodes without defining multiple critical regions.

sathvikn commented 1 year ago

That sounds fine for now. also I'm thinking the easiest way might just be to have two fields in the JSON that say the critical region for the filler & gap? It's not the most urgent but we can keep this open until we figure it out.

umd-psycholing / lm-syntactic-generalization

Columns for critical region #1