huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.24k stars 26.09k forks source link

TinyModel addition #31804

Open noanabeshima opened 1 month ago

noanabeshima commented 1 month ago

Model description

https://github.com/noanabeshima/tiny_model

It's a small language model trained on TinyStories for interpretability with sparse autoencoders and transcoders added. It has no layernorms (this helps with interpretability) which makes it not fit with any existing model architecture in the transformers library. Its architecture is essentially GPT-2's except that it doesn't have layernorms and it has untied embed/deembed.

Open source status

Provide useful links for the implementation

The implementation is here: https://github.com/noanabeshima/tiny_model/blob/main/tiny_model/lm.py

The weights are here: https://huggingface.co/noanabeshima/tiny_model/blob/main/tiny_model.pt

The default config corresponding to the weights is:

    d_model=768,
    n_layers=4,
    n_heads=16,
    max_seq_len=256,
    vocab_size=10_000

I am the author.

LysandreJik commented 1 month ago

It would be quite nice to add this using the new model adder that @ArthurZucker has contributed; @ArthurZucker, when back from leave (next week), do you mind sharing with @noanabeshima how to get this done the best way?

ArthurZucker commented 1 month ago

Hey! sorry for the delay! Yep, my recommendation is to use the #30868 tool to isolate the changes as much as possible 🤗

vishwas-sharma2480 commented 1 month ago

hi @ArthurZucker I am new to open-source contribution and I would like to contribute to add this new model to transformer library can you please provide to any reference or previous PRs that were similar to this

ArthurZucker commented 1 month ago

29622 or #31659 are quite similar, there is also https://huggingface.co/docs/transformers/en/add_new_model which should help!