Context

Nevertheless, the 7b model is quite big and is not suited to run test to check the codebase.

It would be nice to have a tiny LLama model, not trained, that can serve debugging purposes.

Ideally, we would jus take the LLama7b model and keep 2 heads, and/or reduce the embedding size.

The goal would be to be able to even run the script (in debug mode) on CPU in a dozens of seconds with this tiny model.

jina-ai / jerboa