Closed slavag closed 9 months ago
Reverting to commit c38c19e507120eee8b06a2cfe4042c03f5610735 didn't solve the issue Reverting to commit cb09dab183b36c1c49d21a3efc5ad7dc427278bc solved the issue
I suspect you are running out of memory after this change: https://github.com/h2oai/h2ogpt/commit/c38c19e507120eee8b06a2cfe4042c03f5610735
You can pass --max_seq_len=2048
if you want to go back to old (wrong) behavior in order to use less memory.
@pseudotensor why reverting the code to commit cb09dab183b36c1c49d21a3efc5ad7dc427278bc solved the issue ?
@slavag It can't be true, because that has only to do with OpenAI code and is very well isolated to that.
@pseudotensor but it's a fact, I can take recent code again and check and then to revert to c38c19e507120eee8b06a2cfe4042c03f5610735. But I already did this. reverting back, solving the issue.
Sorry, I can't follow what you are saying. https://github.com/h2oai/h2ogpt/commit/c38c19e507120eee8b06a2cfe4042c03f5610735 can matter and is likely the issue but https://github.com/h2oai/h2ogpt/commit/cb09dab183b36c1c49d21a3efc5ad7dc427278bc cannot matter. You must be making an error when checking things. You can see the code only has to do with openai.
But again, it's not just about reverting some commit. The commit https://github.com/h2oai/h2ogpt/commit/c38c19e507120eee8b06a2cfe4042c03f5610735 is not wrong. One needs to reduce the max_seq_len
@pseudotensor sorry for confusing you, but with : HEAD is now at cb09dab1 Fixes #928
command h2ogpt]$ python generate.py --base_model=llama --prompt_type=llama2 --model_path_llama=/mnt/AI/models/TheBloke/Llama-2-13B-chat-GGUF/llama-2-13b-chat.Q6_K.gguf --allow_upload_to_user_data=False --langchain_modes="[ZenDeskTicketsWithDocs, MyData]" --langchain_mode=ZenDeskTicketsWithDocs --langchain_mode_types="{'ZenDeskTicketsWithDocs':'shared'}" --visible_side_bar=False --visible_doc_selection_tab=False --h2ocolors=False --use_llm_if_no_docs=False --max_seq_len=4096 --top_k_docs=-1 --temperature=0 --top_p=0.01 --top_k=2
is working.
And when I'm doing HEAD is now at c38c19e5 Fixes #915
it's failing, same execution command.
@pseudotensor changed to 2048 , same issue, with lates code. reverting back to cb09dab183b36c1c49d21a3efc5ad7dc427278bc - no issue. And, btw there 10GB of VRAM available after starting the chat.
Probably a language issue. When you said before "revert" that normally means to do:
git revert <hash>
I think instead you are doing:
git checkout <hash>
?
FYI if I run your command (just changing to use url instead of path and no score model) on main I see no issues:
(h2ogpt) jon@pseudotensor:~/h2ogpt$ CUDA_VISIBLE_DEVICES=0 python generate.py --base_model=llama --prompt_type=llama2 --model_path_llama=https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q6_K.gguf --allow_upload_to_user_data=False --langchain_modes="[ZenDeskTicketsWithDocs, MyData]" --langchain_mode=ZenDeskTicketsWithDocs --langchain_mode_types="{'ZenDeskTicketsWithDocs':'shared'}" --visible_side_bar=False --visible_doc_selection_tab=False --h2ocolors=False --use_llm_if_no_docs=False --max_seq_len=4096 --top_k_docs=-1 --temperature=0 --top_p=0.01 --top_k=2 --score_model=None
Using Model llama
Starting get_model: llama
/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:1006: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from llama-2-13b-chat.Q6_K.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor 0: token_embd.weight q6_K [ 5120, 32000, 1, 1 ]
llama_model_loader: - tensor 1: blk.0.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 2: blk.0.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 10: blk.1.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 11: blk.1.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 12: blk.1.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 13: blk.1.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 19: blk.10.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 20: blk.10.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 21: blk.10.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 22: blk.10.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 23: blk.10.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 24: blk.10.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 25: blk.10.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 26: blk.10.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 27: blk.10.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 28: blk.11.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 29: blk.11.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 30: blk.11.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 31: blk.11.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 32: blk.11.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 33: blk.11.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 34: blk.11.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 35: blk.11.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 36: blk.11.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 37: blk.12.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 38: blk.12.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 39: blk.12.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 40: blk.12.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 41: blk.12.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 42: blk.12.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 43: blk.12.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 44: blk.12.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 45: blk.12.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 46: blk.13.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 47: blk.13.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 48: blk.13.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 49: blk.13.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 50: blk.13.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 51: blk.13.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 52: blk.13.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 53: blk.13.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 54: blk.13.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 55: blk.14.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 56: blk.14.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 57: blk.14.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 58: blk.14.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 59: blk.14.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 60: blk.14.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 61: blk.14.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 62: blk.14.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 63: blk.14.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 64: blk.15.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 65: blk.15.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 66: blk.2.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 67: blk.2.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 68: blk.2.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 69: blk.2.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 70: blk.2.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 71: blk.2.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 72: blk.2.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 73: blk.2.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 74: blk.2.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 75: blk.3.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 76: blk.3.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 77: blk.3.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 78: blk.3.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 79: blk.3.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 80: blk.3.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 81: blk.3.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 82: blk.3.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 83: blk.3.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 84: blk.4.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 85: blk.4.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 86: blk.4.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 87: blk.4.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 88: blk.4.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 89: blk.4.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 90: blk.4.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 91: blk.4.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 92: blk.4.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 93: blk.5.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 94: blk.5.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 95: blk.5.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 96: blk.5.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 97: blk.5.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 98: blk.5.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 99: blk.5.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 100: blk.5.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 101: blk.5.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 102: blk.6.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 103: blk.6.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 104: blk.6.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 105: blk.6.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 106: blk.6.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 107: blk.6.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 108: blk.6.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 109: blk.6.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 110: blk.6.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 111: blk.7.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 112: blk.7.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 113: blk.7.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 114: blk.7.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 115: blk.7.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 116: blk.7.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 117: blk.7.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 118: blk.7.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 119: blk.7.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 120: blk.8.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 121: blk.8.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 122: blk.8.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 123: blk.8.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 124: blk.8.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 125: blk.8.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 126: blk.8.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 127: blk.8.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 128: blk.8.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 129: blk.9.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 130: blk.9.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 131: blk.9.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 132: blk.9.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 133: blk.9.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 134: blk.9.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 135: blk.9.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 136: blk.9.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 137: blk.9.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 138: blk.15.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 139: blk.15.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 140: blk.15.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 141: blk.15.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 142: blk.15.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 143: blk.15.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 144: blk.15.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 145: blk.16.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 146: blk.16.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 147: blk.16.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 148: blk.16.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 149: blk.16.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 150: blk.16.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 151: blk.16.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 152: blk.16.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 153: blk.16.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 154: blk.17.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 155: blk.17.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 156: blk.17.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 157: blk.17.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 158: blk.17.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 159: blk.17.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 160: blk.17.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 161: blk.17.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 162: blk.17.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 163: blk.18.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 164: blk.18.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 165: blk.18.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 166: blk.18.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 167: blk.18.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 168: blk.18.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 169: blk.18.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 170: blk.18.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 171: blk.18.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 172: blk.19.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 173: blk.19.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 174: blk.19.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 175: blk.19.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 176: blk.19.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 177: blk.19.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 178: blk.19.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 179: blk.19.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 180: blk.19.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 181: blk.20.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 182: blk.20.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 183: blk.20.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 184: blk.20.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 185: blk.20.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 186: blk.20.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 187: blk.20.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 188: blk.20.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 189: blk.20.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 190: blk.21.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 191: blk.21.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 192: blk.21.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 193: blk.21.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 194: blk.21.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 195: blk.21.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 196: blk.21.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 197: blk.21.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 198: blk.21.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 199: blk.22.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 200: blk.22.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 201: blk.22.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 202: blk.22.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 203: blk.22.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 204: blk.22.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 205: blk.22.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 206: blk.22.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 207: blk.22.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 208: blk.23.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 209: blk.23.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 210: blk.23.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 211: blk.23.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 212: blk.23.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 213: blk.23.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 214: blk.23.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 215: blk.23.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 216: blk.23.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 217: blk.24.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 218: blk.24.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 219: blk.24.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 220: blk.24.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 221: blk.24.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 222: blk.24.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 223: blk.24.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 224: blk.24.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 225: blk.24.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 226: blk.25.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 227: blk.25.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 228: blk.25.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 229: blk.25.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 230: blk.25.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 231: blk.25.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 232: blk.25.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 233: blk.25.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 234: blk.25.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 235: blk.26.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 236: blk.26.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 237: blk.26.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 238: blk.26.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 239: blk.26.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 240: blk.26.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 241: blk.26.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 242: blk.26.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 243: blk.26.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 244: blk.27.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 245: blk.27.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 246: blk.27.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 247: blk.27.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 248: blk.27.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 249: blk.27.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 250: blk.27.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 251: blk.27.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 252: blk.27.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 253: blk.28.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 254: blk.28.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 255: blk.28.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 256: blk.28.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 257: blk.28.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 258: blk.28.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 259: blk.28.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 260: blk.28.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 261: blk.28.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 262: blk.29.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 263: blk.29.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 264: blk.29.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 265: blk.29.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 266: blk.29.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 267: blk.29.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 268: blk.29.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 269: blk.29.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 270: blk.29.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 271: blk.30.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 272: blk.30.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 273: blk.30.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 274: blk.30.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 275: blk.30.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 276: blk.30.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 277: output.weight q6_K [ 5120, 32000, 1, 1 ]
llama_model_loader: - tensor 278: blk.30.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 279: blk.30.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 280: blk.30.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 281: blk.31.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 282: blk.31.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 283: blk.31.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 284: blk.31.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 285: blk.31.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 286: blk.31.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 287: blk.31.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 288: blk.31.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 289: blk.31.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 290: blk.32.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 291: blk.32.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 292: blk.32.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 293: blk.32.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 294: blk.32.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 295: blk.32.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 296: blk.32.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 297: blk.32.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 298: blk.32.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 299: blk.33.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 300: blk.33.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 301: blk.33.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 302: blk.33.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 303: blk.33.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 304: blk.33.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 305: blk.33.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 306: blk.33.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 307: blk.33.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 308: blk.34.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 309: blk.34.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 310: blk.34.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 311: blk.34.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 312: blk.34.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 313: blk.34.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 314: blk.34.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 315: blk.34.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 316: blk.34.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 317: blk.35.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 318: blk.35.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 319: blk.35.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 320: blk.35.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 321: blk.35.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 322: blk.35.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 323: blk.35.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 324: blk.35.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 325: blk.35.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 326: blk.36.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 327: blk.36.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 328: blk.36.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 329: blk.36.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 330: blk.36.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 331: blk.36.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 332: blk.36.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 333: blk.36.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 334: blk.36.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 335: blk.37.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 336: blk.37.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 337: blk.37.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 338: blk.37.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 339: blk.37.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 340: blk.37.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 341: blk.37.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 342: blk.37.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 343: blk.37.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 344: blk.38.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 345: blk.38.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 346: blk.38.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 347: blk.38.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 348: blk.38.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 349: blk.38.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 350: blk.38.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 351: blk.38.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 352: blk.38.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 353: blk.39.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 354: blk.39.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 355: blk.39.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 356: blk.39.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 357: blk.39.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 358: blk.39.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 359: blk.39.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 360: blk.39.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 361: blk.39.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 362: output_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: llama.context_length u32
llama_model_loader: - kv 3: llama.embedding_length u32
llama_model_loader: - kv 4: llama.block_count u32
llama_model_loader: - kv 5: llama.feed_forward_length u32
llama_model_loader: - kv 6: llama.rope.dimension_count u32
llama_model_loader: - kv 7: llama.attention.head_count u32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv 10: general.file_type u32
llama_model_loader: - kv 11: tokenizer.ggml.model str
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr
llama_model_loader: - kv 13: tokenizer.ggml.scores arr
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr
llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32
llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32
llama_model_loader: - kv 18: general.quantization_version u32
llama_model_loader: - type f32: 81 tensors
llama_model_loader: - type q6_K: 282 tensors
llm_load_print_meta: format = GGUF V2 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_ctx = 4096
llm_load_print_meta: n_embd = 5120
llm_load_print_meta: n_head = 40
llm_load_print_meta: n_head_kv = 40
llm_load_print_meta: n_layer = 40
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff = 13824
llm_load_print_meta: freq_base = 10000.0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = 13B
llm_load_print_meta: model ftype = mostly Q6_K
llm_load_print_meta: model size = 13.02 B
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.12 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 128.29 MB (+ 3200.00 MB per state)
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 43/43 layers to GPU
llm_load_tensors: VRAM used: 13256 MB
warning: failed to mlock 134402048-byte buffer (after previously locking 0 bytes): Cannot allocate memory
Try increasing RLIMIT_MLOCK ('ulimit -l' as root).
....................................................................................................
llama_new_context_with_model: kv self size = 3200.00 MB
llama_new_context_with_model: compute buffer total size = 351.47 MB
llama_new_context_with_model: VRAM scratch buffer: 350.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
Model {'base_model': 'llama', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': '', 'prompt_type': 'llama2', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': '<s>[INST] ', 'PreInput': None, 'PreResponse': '[/INST]', 'terminate_response': ['[INST]', '</s>'], 'chat_sep': ' ', 'chat_turn_sep': ' </s>', 'humanstr': '[INST]', 'botstr': '[/INST]', 'generates_leading_space': False, 'system_prompt': ''}, 'visible_models': None, 'h2ogpt_key': None, 'load_8bit': False, 'load_4bit': False, 'low_bit_mode': 1, 'load_half': True, 'load_gptq': '', 'load_exllama': False, 'use_safetensors': False, 'revision': None, 'use_gpu_id': True, 'gpu_id': 0, 'compile_model': True, 'use_cache': None, 'llamacpp_dict': {'n_gpu_layers': 100, 'use_mlock': True, 'n_batch': 1024, 'n_gqa': 0, 'model_path_llama': 'https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q6_K.gguf', 'model_name_gptj': 'ggml-gpt4all-j-v1.3-groovy.bin', 'model_name_gpt4all_llama': 'ggml-wizardLM-7B.q4_2.bin', 'model_name_exllama_if_no_config': 'TheBloke/Nous-Hermes-Llama2-GPTQ'}, 'model_path_llama': 'https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q6_K.gguf', 'model_name_gptj': 'ggml-gpt4all-j-v1.3-groovy.bin', 'model_name_gpt4all_llama': 'ggml-wizardLM-7B.q4_2.bin', 'model_name_exllama_if_no_config': 'TheBloke/Nous-Hermes-Llama2-GPTQ'}
load INSTRUCTOR_Transformer
max_seq_length 512
Running on local URL: http://0.0.0.0:7861
To create a public link, set `share=True` in `launch()`.
Started Gradio Server and/or GUI: server_name: 0.0.0.0 port: None
memory usage just after startup is 15GB:
jon@pseudotensor:~/h2ogpt$ nvidia-smi
Sat Oct 7 13:05:51 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Ti On | 00000000:01:00.0 Off | Off |
| 31% 57C P2 114W / 480W| 16611MiB / 24564MiB | 7% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 2080 On | 00000000:03:00.0 Off | N/A |
| 40% 47C P8 7W / 215W| 10MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1615 G /usr/lib/xorg/Xorg 156MiB |
| 0 N/A N/A 2258 G /usr/lib/xorg/Xorg 1190MiB |
| 0 N/A N/A 2391 G /usr/bin/gnome-shell 145MiB |
| 0 N/A N/A 5455 G /usr/bin/nvidia-settings 0MiB |
| 0 N/A N/A 5940 G ...2605455,14348786443269456662,262144 301MiB |
| 0 N/A N/A 6976 G gnome-control-center 4MiB |
| 0 N/A N/A 8291 G ...ures=SpareRendererForSitePerProcess 49MiB |
| 0 N/A N/A 37837 C python 14738MiB |
| 1 N/A N/A 1615 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 2258 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
If you "revert to" (i.e. checkout) https://github.com/h2oai/h2ogpt/commit/cb09dab183b36c1c49d21a3efc5ad7dc427278bc that is just before c38c19e507120eee8b06a2cfe4042c03f5610735 was added, which is the commit I said is the likely issue.
However, if you can easy reproduce the problem, better you isolate exactly which commit it is. This is easy, just do:
git checkout main # ensure at top of main
git bisect start
# run command and confirm bad
git bisect bad
# run command after git checkout cb09dab183b36c1c49d21a3efc5ad7dc427278bc
# assuming still good, do:
git bisect good
# run command
# repeat until it says which was the bad commit
Thanks
@pseudotensor will try to isolate, but when I say revert, I did git reset - -hard <hash>
.
Thanks
@pseudotensor this is what I got as a bad commit
c38c19e507120eee8b06a2cfe4042c03f5610735 is the first bad commit
commit c38c19e507120eee8b06a2cfe4042c03f5610735
Author: Jonathan C. McKinney <pseudotensor@gmail.com>
Date: Fri Oct 6 13:58:17 2023 -0700
Fixes #915
src/gpt4all_llm.py | 2 +-
src/utils.py | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
@slavag Thanks, yes that's the exact commit I mentioned at first. If you look at the code, I don't understand why passing --max_seq_len=2048 wouldn't make it back to exactly like before.
I confirmed manually that at a breakpoint at that point in the code passing --max_seq_len=2048 results in the input to model_max_length=2048, which is the default before.
@pseudotensor Well, have no idea, but changing to 2048 fails also, you can see llm_load_print_meta: n_ctx = 2048
Execution command :
(python311) [ec2-user@ip-10-0-204-192 h2ogpt]$ python generate.py --base_model=llama --prompt_type=llama2 --model_path_llama=/mnt/AI/models/TheBloke/Llama-2-13B-chat-GGUF/llama-2-13b-chat.Q6_K.gguf --allow_upload_to_user_data=False --langchain_modes="[ZenDeskTicketsWithDocs, MyData]" --langchain_mode=ZenDeskTicketsWithDocs --langchain_mode_types="{'ZenDeskTicketsWithDocs':'shared'}" --visible_side_bar=False --visible_doc_selection_tab=False --h2ocolors=False --use_llm_if_no_docs=False --max_seq_len=2048 --top_k_docs=-1 --temperature=0 --top_p=0.01 --top_k=2
Using Model llama
load INSTRUCTOR_Transformer
max_seq_length 512
Starting get_model: llama
/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py:1006: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA A10G, compute capability 8.6
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from /mnt/AI/models/TheBloke/Llama-2-13B-chat-GGUF/llama-2-13b-chat.Q6_K.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor 0: token_embd.weight q6_K [ 5120, 32000, 1, 1 ]
llama_model_loader: - tensor 1: blk.0.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 2: blk.0.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 10: blk.1.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 11: blk.1.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 12: blk.1.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 13: blk.1.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 19: blk.10.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 20: blk.10.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 21: blk.10.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 22: blk.10.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 23: blk.10.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 24: blk.10.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 25: blk.10.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 26: blk.10.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 27: blk.10.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 28: blk.11.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 29: blk.11.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 30: blk.11.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 31: blk.11.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 32: blk.11.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 33: blk.11.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 34: blk.11.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 35: blk.11.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 36: blk.11.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 37: blk.12.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 38: blk.12.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 39: blk.12.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 40: blk.12.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 41: blk.12.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 42: blk.12.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 43: blk.12.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 44: blk.12.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 45: blk.12.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 46: blk.13.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 47: blk.13.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 48: blk.13.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 49: blk.13.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 50: blk.13.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 51: blk.13.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 52: blk.13.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 53: blk.13.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 54: blk.13.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 55: blk.14.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 56: blk.14.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 57: blk.14.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 58: blk.14.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 59: blk.14.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 60: blk.14.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 61: blk.14.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 62: blk.14.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 63: blk.14.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 64: blk.15.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 65: blk.15.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 66: blk.2.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 67: blk.2.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 68: blk.2.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 69: blk.2.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 70: blk.2.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 71: blk.2.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 72: blk.2.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 73: blk.2.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 74: blk.2.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 75: blk.3.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 76: blk.3.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 77: blk.3.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 78: blk.3.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 79: blk.3.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 80: blk.3.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 81: blk.3.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 82: blk.3.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 83: blk.3.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 84: blk.4.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 85: blk.4.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 86: blk.4.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 87: blk.4.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 88: blk.4.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 89: blk.4.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 90: blk.4.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 91: blk.4.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 92: blk.4.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 93: blk.5.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 94: blk.5.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 95: blk.5.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 96: blk.5.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 97: blk.5.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 98: blk.5.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 99: blk.5.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 100: blk.5.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 101: blk.5.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 102: blk.6.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 103: blk.6.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 104: blk.6.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 105: blk.6.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 106: blk.6.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 107: blk.6.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 108: blk.6.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 109: blk.6.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 110: blk.6.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 111: blk.7.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 112: blk.7.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 113: blk.7.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 114: blk.7.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 115: blk.7.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 116: blk.7.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 117: blk.7.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 118: blk.7.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 119: blk.7.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 120: blk.8.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 121: blk.8.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 122: blk.8.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 123: blk.8.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 124: blk.8.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 125: blk.8.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 126: blk.8.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 127: blk.8.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 128: blk.8.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 129: blk.9.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 130: blk.9.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 131: blk.9.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 132: blk.9.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 133: blk.9.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 134: blk.9.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 135: blk.9.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 136: blk.9.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 137: blk.9.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 138: blk.15.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 139: blk.15.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 140: blk.15.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 141: blk.15.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 142: blk.15.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 143: blk.15.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 144: blk.15.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 145: blk.16.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 146: blk.16.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 147: blk.16.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 148: blk.16.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 149: blk.16.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 150: blk.16.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 151: blk.16.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 152: blk.16.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 153: blk.16.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 154: blk.17.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 155: blk.17.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 156: blk.17.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 157: blk.17.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 158: blk.17.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 159: blk.17.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 160: blk.17.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 161: blk.17.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 162: blk.17.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 163: blk.18.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 164: blk.18.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 165: blk.18.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 166: blk.18.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 167: blk.18.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 168: blk.18.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 169: blk.18.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 170: blk.18.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 171: blk.18.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 172: blk.19.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 173: blk.19.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 174: blk.19.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 175: blk.19.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 176: blk.19.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 177: blk.19.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 178: blk.19.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 179: blk.19.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 180: blk.19.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 181: blk.20.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 182: blk.20.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 183: blk.20.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 184: blk.20.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 185: blk.20.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 186: blk.20.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 187: blk.20.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 188: blk.20.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 189: blk.20.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 190: blk.21.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 191: blk.21.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 192: blk.21.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 193: blk.21.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 194: blk.21.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 195: blk.21.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 196: blk.21.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 197: blk.21.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 198: blk.21.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 199: blk.22.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 200: blk.22.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 201: blk.22.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 202: blk.22.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 203: blk.22.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 204: blk.22.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 205: blk.22.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 206: blk.22.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 207: blk.22.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 208: blk.23.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 209: blk.23.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 210: blk.23.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 211: blk.23.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 212: blk.23.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 213: blk.23.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 214: blk.23.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 215: blk.23.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 216: blk.23.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 217: blk.24.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 218: blk.24.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 219: blk.24.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 220: blk.24.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 221: blk.24.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 222: blk.24.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 223: blk.24.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 224: blk.24.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 225: blk.24.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 226: blk.25.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 227: blk.25.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 228: blk.25.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 229: blk.25.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 230: blk.25.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 231: blk.25.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 232: blk.25.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 233: blk.25.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 234: blk.25.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 235: blk.26.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 236: blk.26.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 237: blk.26.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 238: blk.26.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 239: blk.26.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 240: blk.26.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 241: blk.26.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 242: blk.26.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 243: blk.26.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 244: blk.27.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 245: blk.27.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 246: blk.27.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 247: blk.27.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 248: blk.27.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 249: blk.27.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 250: blk.27.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 251: blk.27.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 252: blk.27.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 253: blk.28.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 254: blk.28.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 255: blk.28.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 256: blk.28.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 257: blk.28.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 258: blk.28.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 259: blk.28.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 260: blk.28.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 261: blk.28.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 262: blk.29.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 263: blk.29.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 264: blk.29.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 265: blk.29.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 266: blk.29.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 267: blk.29.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 268: blk.29.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 269: blk.29.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 270: blk.29.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 271: blk.30.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 272: blk.30.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 273: blk.30.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 274: blk.30.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 275: blk.30.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 276: blk.30.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 277: output.weight q6_K [ 5120, 32000, 1, 1 ]
llama_model_loader: - tensor 278: blk.30.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 279: blk.30.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 280: blk.30.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 281: blk.31.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 282: blk.31.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 283: blk.31.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 284: blk.31.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 285: blk.31.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 286: blk.31.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 287: blk.31.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 288: blk.31.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 289: blk.31.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 290: blk.32.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 291: blk.32.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 292: blk.32.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 293: blk.32.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 294: blk.32.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 295: blk.32.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 296: blk.32.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 297: blk.32.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 298: blk.32.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 299: blk.33.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 300: blk.33.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 301: blk.33.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 302: blk.33.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 303: blk.33.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 304: blk.33.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 305: blk.33.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 306: blk.33.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 307: blk.33.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 308: blk.34.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 309: blk.34.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 310: blk.34.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 311: blk.34.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 312: blk.34.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 313: blk.34.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 314: blk.34.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 315: blk.34.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 316: blk.34.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 317: blk.35.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 318: blk.35.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 319: blk.35.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 320: blk.35.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 321: blk.35.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 322: blk.35.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 323: blk.35.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 324: blk.35.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 325: blk.35.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 326: blk.36.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 327: blk.36.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 328: blk.36.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 329: blk.36.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 330: blk.36.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 331: blk.36.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 332: blk.36.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 333: blk.36.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 334: blk.36.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 335: blk.37.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 336: blk.37.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 337: blk.37.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 338: blk.37.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 339: blk.37.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 340: blk.37.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 341: blk.37.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 342: blk.37.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 343: blk.37.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 344: blk.38.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 345: blk.38.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 346: blk.38.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 347: blk.38.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 348: blk.38.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 349: blk.38.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 350: blk.38.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 351: blk.38.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 352: blk.38.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 353: blk.39.attn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 354: blk.39.ffn_down.weight q6_K [ 13824, 5120, 1, 1 ]
llama_model_loader: - tensor 355: blk.39.ffn_gate.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 356: blk.39.ffn_up.weight q6_K [ 5120, 13824, 1, 1 ]
llama_model_loader: - tensor 357: blk.39.ffn_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - tensor 358: blk.39.attn_k.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 359: blk.39.attn_output.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 360: blk.39.attn_q.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 361: blk.39.attn_v.weight q6_K [ 5120, 5120, 1, 1 ]
llama_model_loader: - tensor 362: output_norm.weight f32 [ 5120, 1, 1, 1 ]
llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: llama.context_length u32
llama_model_loader: - kv 3: llama.embedding_length u32
llama_model_loader: - kv 4: llama.block_count u32
llama_model_loader: - kv 5: llama.feed_forward_length u32
llama_model_loader: - kv 6: llama.rope.dimension_count u32
llama_model_loader: - kv 7: llama.attention.head_count u32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv 10: general.file_type u32
llama_model_loader: - kv 11: tokenizer.ggml.model str
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr
llama_model_loader: - kv 13: tokenizer.ggml.scores arr
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr
llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32
llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32
llama_model_loader: - kv 18: general.quantization_version u32
llama_model_loader: - type f32: 81 tensors
llama_model_loader: - type q6_K: 282 tensors
llm_load_print_meta: format = GGUF V2 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_ctx = 2048
llm_load_print_meta: n_embd = 5120
llm_load_print_meta: n_head = 40
llm_load_print_meta: n_head_kv = 40
llm_load_print_meta: n_layer = 40
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff = 13824
llm_load_print_meta: freq_base = 10000.0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = 13B
llm_load_print_meta: model ftype = mostly Q6_K
llm_load_print_meta: model size = 13.02 B
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.12 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 128.29 MB (+ 1600.00 MB per state)
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 43/43 layers to GPU
llm_load_tensors: VRAM used: 11656 MB
....................................................................................................
llama_new_context_with_model: kv self size = 1600.00 MB
llama_new_context_with_model: compute buffer total size = 191.47 MB
llama_new_context_with_model: VRAM scratch buffer: 190.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
Model {'base_model': 'llama', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': '', 'prompt_type': 'llama2', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': '<s>[INST] ', 'PreInput': None, 'PreResponse': '[/INST]', 'terminate_response': ['[INST]', '</s>'], 'chat_sep': ' ', 'chat_turn_sep': ' </s>', 'humanstr': '[INST]', 'botstr': '[/INST]', 'generates_leading_space': False, 'system_prompt': ''}, 'visible_models': None, 'h2ogpt_key': None, 'load_8bit': False, 'load_4bit': False, 'low_bit_mode': 1, 'load_half': True, 'load_gptq': '', 'load_exllama': False, 'use_safetensors': False, 'revision': None, 'use_gpu_id': True, 'gpu_id': 0, 'compile_model': True, 'use_cache': None, 'llamacpp_dict': {'n_gpu_layers': 100, 'use_mlock': True, 'n_batch': 1024, 'n_gqa': 0, 'model_path_llama': '/mnt/AI/models/TheBloke/Llama-2-13B-chat-GGUF/llama-2-13b-chat.Q6_K.gguf', 'model_name_gptj': 'ggml-gpt4all-j-v1.3-groovy.bin', 'model_name_gpt4all_llama': 'ggml-wizardLM-7B.q4_2.bin', 'model_name_exllama_if_no_config': 'TheBloke/Nous-Hermes-Llama2-GPTQ'}, 'model_path_llama': '/mnt/AI/models/TheBloke/Llama-2-13B-chat-GGUF/llama-2-13b-chat.Q6_K.gguf', 'model_name_gptj': 'ggml-gpt4all-j-v1.3-groovy.bin', 'model_name_gpt4all_llama': 'ggml-wizardLM-7B.q4_2.bin', 'model_name_exllama_if_no_config': 'TheBloke/Nous-Hermes-Llama2-GPTQ'}
load INSTRUCTOR_Transformer
max_seq_length 512
/mnt/AI/h2ogpt/src/gradio_runner.py:732: GradioUnusedKwargWarning: You have unused kwarg parameters in Dropdown, please remove them: {'filterable': False}
langchain_agents = gr.Dropdown(
/mnt/AI/h2ogpt/src/gradio_runner.py:836: GradioUnusedKwargWarning: You have unused kwarg parameters in Dropdown, please remove them: {'filterable': False}
visible_models = gr.Dropdown(kwargs['all_models'],
Running on local URL: http://0.0.0.0:7860
And it's fails. While it have plenty available VRAM
ggml_allocr_alloc: not enough space in the buffer (needed 167772160, largest block available 136312864)
GGML_ASSERT: /tmp/pip-install-lq9gmmv7/llama-cpp-python_511fca8a040049ac9ab4298ec532185f/vendor/llama.cpp/ggml-alloc.c:173: !"not enough space in the buffer"
Fatal Python error: Aborted
Current thread 0x00007ffa5ef82700 (most recent call first):
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/llama_cpp/llama_cpp.py", line 808 in llama_eval
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/llama_cpp/llama.py", line 500 in eval
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/llama_cpp/llama.py", line 777 in generate
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/llama_cpp/llama.py", line 957 in _create_completion
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/llms/llamacpp.py", line 288 in _stream
File "/mnt/AI/h2ogpt/src/gpt4all_llm.py", line 399 in _stream
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/llms/base.py", line 341 in stream
File "/mnt/AI/h2ogpt/src/gpt4all_llm.py", line 365 in _call
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/llms/base.py", line 961 in _generate
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/llms/base.py", line 475 in _generate_helper
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/llms/base.py", line 582 in generate
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/llms/base.py", line 451 in generate_prompt
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/chains/llm.py", line 102 in generate
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/chains/llm.py", line 92 in _call
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/chains/base.py", line 252 in __call__
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/chains/llm.py", line 252 in predict
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/chains/combine_documents/stuff.py", line 165 in combine_docs
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/chains/combine_documents/base.py", line 106 in _call
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/chains/base.py", line 252 in __call__
File "/mnt/AI/h2ogpt/src/utils.py", line 412 in run
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap
Thread 0x00007ffa5f983700 (most recent call first):
File "/mnt/miniconda3/envs/python311/lib/python3.11/concurrent/futures/thread.py", line 81 in _worker
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 975 in run
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap
Thread 0x00007ffa60db7700 (most recent call first):
File "/mnt/AI/h2ogpt/src/utils_langchain.py", line 67 in __next__
File "/mnt/AI/h2ogpt/src/gpt_langchain.py", line 3808 in _run_qa_db
File "/mnt/AI/h2ogpt/src/gen.py", line 2382 in evaluate
File "/mnt/AI/h2ogpt/src/gradio_runner.py", line 3015 in get_response
File "/mnt/AI/h2ogpt/src/gradio_runner.py", line 3066 in bot
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/gradio/utils.py", line 695 in gen_wrapper
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/gradio/utils.py", line 326 in run_sync_iterator_async
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807 in run
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap
Thread 0x00007ffab2d5a700 (most recent call first):
File "/mnt/miniconda3/envs/python311/lib/python3.11/asyncio/runners.py", line 118 in run
File "/mnt/miniconda3/envs/python311/lib/python3.11/asyncio/runners.py", line 190 in run
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/uvicorn/server.py", line 61 in run
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 975 in run
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap
Thread 0x00007ffab375b700 (most recent call first):
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 324 in wait
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 622 in wait
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/apscheduler/schedulers/blocking.py", line 30 in _main_loop
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 975 in run
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap
Thread 0x00007ffaca3b0700 (most recent call first):
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 324 in wait
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 622 in wait
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/tqdm/_monitor.py", line 60 in run
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap
Thread 0x00007ffacc654700 (most recent call first):
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 324 in wait
File "/mnt/miniconda3/envs/python311/lib/python3.11/queue.py", line 180 in get
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/posthog/consumer.py", line 104 in next
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/posthog/consumer.py", line 73 in upload
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/posthog/consumer.py", line 62 in run
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap
Thread 0x00007ffbdaaa5740 (most recent call first):
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/gradio/blocks.py", line 2202 in block_thread
File "/mnt/AI/h2ogpt/src/gradio_runner.py", line 4183 in go_gradio
File "/mnt/AI/h2ogpt/src/gen.py", line 1271 in main
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/fire/core.py", line 691 in _CallAndUpdateTrace
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/fire/core.py", line 475 in _Fire
File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/fire/core.py", line 141 in Fire
File "/mnt/AI/h2ogpt/src/utils.py", line 64 in H2O_Fire
File "/mnt/AI/h2ogpt/generate.py", line 12 in entrypoint_main
File "/mnt/AI/h2ogpt/generate.py", line 16 in <module>
Extension modules: simplejson._speedups, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.hashing, pyarrow.lib, pyarrow._hdfsio, pandas._libs.tslib, pandas._libs.ops, numexpr.interpreter, pyarrow._compute, pandas._libs.arrays, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, lz4._version, lz4.frame._frame, psutil._psutil_linux, psutil._psutil_posix, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, matplotlib._image, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, yaml._yaml, sentencepiece._sentencepiece, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, multidict._multidict, yarl._quoting_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, greenlet._greenlet, scipy._lib._ccallback_c, numpy.linalg.lapack_lite, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.linalg._flinalg, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._statlib, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, regex._regex, sklearn.__check_build._check_build, sklearn.utils.murmurhash, sklearn.utils._isfinite, sklearn.utils._openmp_helpers, sklearn.utils._vector_sentinel, sklearn.feature_extraction._hashing_fast, sklearn.utils._logistic_sigmoid, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.utils._cython_blas, sklearn.svm._libsvm, sklearn.svm._liblinear, sklearn.svm._libsvm_sparse, sklearn.utils._random, sklearn.utils._seq_dataset, sklearn.utils.arrayfuncs, sklearn.utils._typedefs, sklearn.utils._readonly_array_wrapper, sklearn.metrics._dist_metrics, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_fast, sklearn.linear_model._cd_fast, sklearn._loss._loss, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.linear_model._sag_fast, sklearn.datasets._svmlight_format_fast, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, google._upb._message, zstandard.backend_c, websockets.speedups, ujson, markupsafe._speedups, PIL._webp, uvloop.loop, httptools.parser.parser, httptools.parser.url_parser (total: 262)
Aborted
Hi @slavag I'd like to solve the problem, but since it doesn't fail for me it is hard. Are you able to edit the code to understand why --max_seq_len=2048 is not enough? That is, I presume this leads to things to work since you said it was the bad commit:
git checkout main
git revert c38c19e507120eee8b06a2cfe4042c03f5610735
# run your command
If this does work, then try to understand how FakeTokenizer is being set differently when you pass --max_seq_len=2048. I don't understand how that commit can lead to the failure if 2048 is used. Thanks for help!
@pseudotensor I also tried to figure it out and made new clean environment, and now it's working fine, with 4096 and with 2048. I don't know what caused that issue. but creating entirely new and clean python env solved the issue. Thanks for your help
Ok, if it comes back let me know...
Hi, After I updated h2ogpt code to latest (commit 6a7283eb66096d188f796760f58680c1d9c16dbc) I started to get ggml_allocr_alloc: not enough space in the buffer , using Nvidia A10 with 24GB VRAM. It was working fine, I updated code of h2ogpt yesterday and everything was fine, after starting VRAM was utilized by 67% only.
Startup log :
Error log :