Open Bachstelze opened 3 months ago
Good question: The library builds upon nanoGPT, so instruction-tuned GPT2 models shouldn't be a problem (probably a matter of updating the list of available models). Which model are interested in specifically?
LLAMA-2 series models
llama2 are not gpt2 models, AFAIK. So not supported at the moment.
How does the attention look in instruction-tuned GPTs? reference: https://github.com/jessevig/bertviz/issues/128