inspect LM surprisal - Githubissues

rbroc commented 2 years ago

Get surprisal metrics for different models and across transcript vs. force-aligned (= no punctuation)
Look at correlations between models
Qualitative inspection of examples
For now, only focus on window_size = 25

rbroc commented 2 years ago

recap of what done so far: extracted surprisal, entropy and losses for GPT with different window sizes, using full transcripts vs. force-aligned for Narratives and Sherlock. (BERT cannot be used for forward language modeling as-is, as it'll always expect a [SEP] token). Fun facts about these metrics, entropy is higher for force-aligned (where there's no punctuation) while surprisal is lower for force-aligned (probably the lack of punctuation decreases the model's confidence on top-predicted words: download Loss is obviously lower when there is punctuation.

I've looked into the extent to which entropy and surprisal correlate between:

Transcripts and force-aligned transcript (the latter with no punctuation and capitalization + presence of tokens)
Force-aligned transcripts with capitalization and no tokens and lowercased transcripts with tokens.

In short, in the first case they correlate ~.65-.75, in the latter ~.85-.90. They are both acceptable correlation levels for our goals, so I'd say we shouldn't worry too much about working with transcripts. Interesting fact though is that punctuation alone makes a .20 difference. I've only looked at this for one of the narratives, but will do that more systematically for all of them asap.

Next steps:

Will add a table showing some of the predictions the model makes, which I think are fairly reasonable;
Implementing a GPT extractor for pliers;
Trying a few models on Neuroscout data and inspect them (maybe both stat maps and R-squared);

rbroc commented 2 years ago

(note that this is not necessarily relevant for the paper but keeping it here just in case)

satra commented 2 years ago

@rbroc - that looks great. thanks for the summary.

rbroc commented 2 years ago

closing this, for the LM part we've done some more comprehensive analyses and have a preprint (coming soon), we also now have GPT entropy and surprisal extractors and have played a bit with them - will push to a separate repo if we decide to keep working on it

neuroscout / neuroscout-paper

inspect LM surprisal #40