Open VeritasJoker opened 9 months ago
Use this script: https://github.com/hassonlab/247-pickling/blob/dev/scripts/tfsemb_perplexity.py
and this command: perp-embeddings: mkdir -p logs for conv_id in $(CONV_IDS); do \ python scripts/tfsemb_perplexity.py \ --project-id $(PRJCT_ID) \ --pkl-identifier $(PKL_IDENTIFIER) \ --subject $(SID) \ --conversation-id $$conv_id \ --embedding-type $(EMB_TYPE); \ done;
It should take less than a minute to run for each of those if we have the model downloaded
With stride 512, 1024, 2048, 4096:
Actually can you just do the four strides for all of the models. I'm making a table for all the values here: https://docs.google.com/spreadsheets/d/1E3k9gCvqsWERyPmvXvo-0yfYIyFt5XJQhYNS2ykKooo/edit?usp=sharing
where did you find this script?
HuggingFace lol
Here: https://huggingface.co/docs/transformers/en/perplexity
Does it matter if the models are quantized or not?
Use this script: https://github.com/hassonlab/247-pickling/blob/dev/scripts/tfsemb_perplexity.py
and this command: perp-embeddings: mkdir -p logs for conv_id in $(CONV_IDS); do \ python scripts/tfsemb_perplexity.py \ --project-id $(PRJCT_ID) \ --pkl-identifier $(PKL_IDENTIFIER) \ --subject $(SID) \ --conversation-id $$conv_id \ --embedding-type $(EMB_TYPE); \ done;
It should take less than a minute to run for each of those if we have the model downloaded
With stride 512, 1024, 2048, 4096:
Actually can you just do the four strides for all of the models. I'm making a table for all the values here: https://docs.google.com/spreadsheets/d/1E3k9gCvqsWERyPmvXvo-0yfYIyFt5XJQhYNS2ykKooo/edit?usp=sharing