kanishkamisra / minicons

Utility for behavioral and representational analyses of Language Models
https://minicons.kanishka.website
MIT License
122 stars 29 forks source link

support customized models and tokenizers #33

Closed wwt17 closed 1 year ago

wwt17 commented 1 year ago

These are the minimal changes I made in order to use scorers in Najoung's category abstraction code. These should not affect existing uses of APIs.

The documentation should be complemented. Please check the docs and typings.

By getting word_ids only when PLL_metric="within_word_l2r", I also supported not fast tokenizers in MaskedLMScorer.

netlify[bot] commented 1 year ago

Deploy Preview for pyminicons canceled.

Name Link
Latest commit ee44c156c1dc8627c27abaafd3931e22738efb41
Latest deploy log https://app.netlify.com/sites/pyminicons/deploys/64bfd387add64d0008c58ef9
kanishkamisra commented 1 year ago

Hi Wentao, thanks for this very useful PR! Would you mind adding examples about how the new code with custom tokenizer would work? Would be useful for older users! Just modifying the readme would work or you can add your own md document!