Are there different specific instructions for running Red Pajama?

I've followed the prerequisites, I can't run red pajama 3B with llama.cpp, I think it's only available inside the ggml repo right? But I went ahead anyway assuming gpt-llama.cpp does something to enable it. I've placed the model like so ../llama.cpp/models/ggml/gpt-neox/rp-instruct-3b-v1-ggml-model-q4_0.bin

Running http://localhost:443/v1/models returns Missing API_KEY. Please set up your API_KEY (in this case path to model .bin in your ./llama.cpp folder). I'm not sure where to put this path. Tried API_KEY=<path to model> npm start Tried entering <path to model> in Swagger's Bearer token. Where do I set this API_KEY?

Edit: So I tried ggml but it's also not working? I'm confused how to run Red Pajama

./bin/gpt-neox -m ../../models/rp-instruct-3b-v1-ggml-model-q4_0.bin -p "How do I build a website?"
main: seed = 1684913741
gpt_neox_model_load: loading model from '../../models/rp-instruct-3b-v1-ggml-model-q4_0.bin' - please wait ...
gpt_neox_model_load: n_vocab = 50432
gpt_neox_model_load: n_ctx   = 2048
gpt_neox_model_load: n_embd  = 2560
gpt_neox_model_load: n_head  = 32
gpt_neox_model_load: n_layer = 32
gpt_neox_model_load: n_rot   = 80
gpt_neox_model_load: par_res = 0
gpt_neox_model_load: ftype   = 2
gpt_neox_model_load: qntvr   = 0
gpt_neox_model_load: ggml ctx size = 3572.54 MB
gpt_neox_model_load: memory_size =   640.00 MB, n_mem = 65536
terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_M_create
Aborted
[fedorauser@W10JB1S9K3 build]$

keldenl / gpt-llama.cpp

Are there different specific instructions for running Red Pajama? #48