hassonlab / 247-pickling

Contains code to create pickles from raw/processed data
1 stars 10 forks source link

Better way to include models that don't belong to a particular category #95

Closed hvgazula closed 1 year ago

hvgazula commented 1 year ago

https://github.com/hassonlab/247-pickling/blob/b86d60a5441bf7581b420057b3f1ded6f4eaa051/scripts/tfsemb_download.py#L27-L28

These lists are exclusively used to download models for that particular class. So, it is obvious that gpt* models will fail with MLM. However, I know that this was done to do some checks in tfsemb_main.py. I could also try to catch this in the download script itself but I prefer separating this for clarity. For example, looking at the download script, we know what models from each class are being analyzed. The cross-talk (the causal model in the masked model list) is an analysis decision and thus should be separate.

VeritasJoker commented 1 year ago

I think just remove this for now. These lines are hacks that are used to generate-embeddings for gpt2-xl on the full-utterance level. I don't think we need them at the moment.

hvgazula commented 1 year ago

@VeritasJoker Closing this issue for the time being.