[Question]: The test case is not responding

denghao1120 commented 11 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

python version: 3.11.4 llama_index: 0.8.43

### code: ` from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data() index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine() response = query_engine.query("What did the author do growing up?") print(response) `

### console result: ` PS D:\Desktop\llama> & D:/Python/python.exe d:/Desktop/llama/test01.py

Could not load OpenAI model. Using default LlamaCPP=llama2-13b-chat. If you intended to use OpenAI, please check your OPENAI_API_KEY. Original error: No API key found for OpenAI. Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization. API keys can be found or created at https://platform.openai.com/account/api-keys

llama_model_loader: loaded llama_model_loader: - tensor 0: llama_model_loader: - tensor 1: llama_model_loader: - tensor 2: llama_model_loader: - tensor 3: llama_model_loader: - tensor 4: llama_model_loader: - tensor 5: llama_model_loader: - tensor 6: llama_model_loader: - tensor 7: llama_model_loader: - tensor 8: llama_model_loader: - tensor 9: llama_model_loader: - tensor 10: llama_model_loader: - tensor 11: llama_model_loader: - tensor 12: llama_model_loader: - tensor 13: llama_model_loader: - tensor 14: llama_model_loader: - tensor 15: llama_model_loader: - tensor 16: llama_model_loader: - tensor 17: llama_model_loader: - tensor 18: llama_model_loader: - tensor 19: llama_model_loader: - tensor 20: llama_model_loader: - tensor 21: llama_model_loader: - tensor 22: llama_model_loader: - tensor 23: llama_model_loader: - tensor 24: llama_model_loader: - tensor 25: llama_model_loader: - tensor 26: llama_model_loader: - tensor 27: llama_model_loader: - tensor 28: llama_model_loader: - tensor 29: llama_model_loader: - tensor 30: llama_model_loader: - tensor 31: llama_model_loader: - tensor 32: llama_model_loader: - tensor 33: llama_model_loader: - tensor 34: llama_model_loader: - tensor 35: llama_model_loader: - tensor 36: llama_model_loader: - tensor 37: llama_model_loader: - tensor 38: llama_model_loader: - tensor 39: llama_model_loader: - tensor 40: llama_model_loader: - tensor 41: llama_model_loader: - tensor 42: llama_model_loader: - tensor 43: llama_model_loader: - tensor 44: llama_model_loader: - tensor 45: llama_model_loader: - tensor 46: llama_model_loader: - tensor 47: llama_model_loader: - tensor 48: llama_model_loader: - tensor 49: llama_model_loader: - tensor 50: llama_model_loader: - tensor 51: llama_model_loader: - tensor 52: llama_model_loader: - tensor 53: llama_model_loader: - tensor 54: llama_model_loader: - tensor 55: llama_model_loader: - tensor 56: llama_model_loader: - tensor 57: llama_model_loader: - tensor 58: llama_model_loader: - tensor 59: llama_model_loader: - tensor 60: llama_model_loader: - tensor 61: llama_model_loader: - tensor 62: llama_model_loader: - tensor 63: llama_model_loader: - tensor 64: llama_model_loader: - tensor 65: llama_model_loader: - tensor 66: llama_model_loader: - tensor 67: llama_model_loader: - tensor 68: llama_model_loader: - tensor 69: llama_model_loader: - tensor 70: llama_model_loader: - tensor 71: llama_model_loader: - tensor 72: llama_model_loader: - tensor 73: llama_model_loader: - tensor 74: llama_model_loader: - tensor 75: llama_model_loader: - tensor 76: llama_model_loader: - tensor 77: llama_model_loader: - tensor 78: llama_model_loader: - tensor 79: llama_model_loader: - tensor 80: llama_model_loader: - tensor 81: llama_model_loader: - tensor 82: llama_model_loader: - tensor 83: llama_model_loader: - tensor 84: llama_model_loader: - tensor 85: llama_model_loader: - tensor 86: llama_model_loader: - tensor 87: llama_model_loader: - tensor 88: llama_model_loader: - tensor 89: llama_model_loader: - tensor 90: llama_model_loader: - tensor 91: llama_model_loader: - tensor 92: llama_model_loader: - tensor 93: llama_model_loader: - tensor 94: llama_model_loader: - tensor 95: llama_model_loader: - tensor 96: llama_model_loader: - tensor 97: llama_model_loader: - tensor 98: llama_model_loader: - tensor 99: llama_model_loader: - tensor 100: llama_model_loader: - tensor 101: llama_model_loader: - tensor 102: llama_model_loader: - tensor 103: llama_model_loader: - tensor 104: llama_model_loader: - tensor 105: llama_model_loader: - tensor 106: llama_model_loader: - tensor 107: llama_model_loader: - tensor 108: llama_model_loader: - tensor 109: llama_model_loader: - tensor 110: llama_model_loader: - tensor 111: llama_model_loader: - tensor 112: llama_model_loader: - tensor 113: llama_model_loader: - tensor 114: llama_model_loader: - tensor 115: llama_model_loader: - tensor 116: llama_model_loader: - tensor 117: llama_model_loader: - tensor 118: llama_model_loader: - tensor 119: llama_model_loader: - tensor 120: llama_model_loader: - tensor 121: llama_model_loader: - tensor 122: llama_model_loader: - tensor 123: llama_model_loader: - tensor 124: llama_model_loader: - tensor 125: llama_model_loader: - tensor 126: llama_model_loader: - tensor 127: llama_model_loader: - tensor 128: llama_model_loader: - tensor 129: llama_model_loader: - tensor 130: llama_model_loader: - tensor 131: llama_model_loader: - tensor 132: llama_model_loader: - tensor 133: llama_model_loader: - tensor 134: llama_model_loader: - tensor 135: llama_model_loader: - tensor 136: llama_model_loader: - tensor 137: llama_model_loader: - tensor 138: llama_model_loader: - tensor 139: llama_model_loader: - tensor 140: llama_model_loader: - tensor 141: llama_model_loader: - tensor 142: llama_model_loader: - tensor 143: llama_model_loader: - tensor 144: llama_model_loader: - tensor 145: llama_model_loader: - tensor 146: llama_model_loader: - tensor 147: llama_model_loader: - tensor 148: llama_model_loader: - tensor 149: llama_model_loader: - tensor 150: llama_model_loader: - tensor 151: llama_model_loader: - tensor 152: llama_model_loader: - tensor 153: llama_model_loader: - tensor 154: llama_model_loader: - tensor 155: llama_model_loader: - tensor 156: llama_model_loader: - tensor 157: llama_model_loader: - tensor 158: llama_model_loader: - tensor 159: llama_model_loader: - tensor 160: llama_model_loader: - tensor 161: llama_model_loader: - tensor 162: llama_model_loader: - tensor 163: llama_model_loader: - tensor 164: llama_model_loader: - tensor 165: llama_model_loader: - tensor 166: llama_model_loader: - tensor 167: llama_model_loader: - tensor 168: llama_model_loader: - tensor 169: llama_model_loader: - tensor 170: llama_model_loader: - tensor 171: llama_model_loader: - tensor 172: llama_model_loader: - tensor 173: llama_model_loader: - tensor 174: llama_model_loader: - tensor 175: llama_model_loader: - tensor 176: llama_model_loader: - tensor 177: llama_model_loader: - tensor 178: llama_model_loader: - tensor 179: llama_model_loader: - tensor 180: llama_model_loader: - tensor 181: llama_model_loader: - tensor 182: llama_model_loader: - tensor 183: llama_model_loader: - tensor 184: llama_model_loader: - tensor 185: llama_model_loader: - tensor 186: llama_model_loader: - tensor 187: llama_model_loader: - tensor 188: llama_model_loader: - tensor 189: llama_model_loader: - tensor 190: llama_model_loader: - tensor 191: llama_model_loader: - tensor 192: llama_model_loader: - tensor 193: llama_model_loader: - tensor 194: llama_model_loader: - tensor 195: llama_model_loader: - tensor 196: llama_model_loader: - tensor 197: llama_model_loader: - tensor 198: llama_model_loader: - tensor 199: llama_model_loader: - tensor 200: llama_model_loader: - tensor 201: llama_model_loader: - tensor 202: llama_model_loader: - tensor 203: llama_model_loader: - tensor 204: llama_model_loader: - tensor 205: llama_model_loader: - tensor 206: llama_model_loader: - tensor 207: llama_model_loader: - tensor 208: llama_model_loader: - tensor 209: llama_model_loader: - tensor 210: llama_model_loader: - tensor 211: llama_model_loader: - tensor 212: llama_model_loader: - tensor 213: llama_model_loader: - tensor 214: llama_model_loader: - tensor 215: llama_model_loader: - tensor 216: llama_model_loader: - tensor 217: llama_model_loader: - tensor 218: llama_model_loader: - tensor 219: llama_model_loader: - tensor 220: llama_model_loader: - tensor 221: llama_model_loader: - tensor 222: llama_model_loader: - tensor 223: llama_model_loader: - tensor 224: llama_model_loader: - tensor 225: llama_model_loader: - tensor 226: llama_model_loader: - tensor 227: llama_model_loader: - tensor 228: llama_model_loader: - tensor 229: llama_model_loader: - tensor 230: llama_model_loader: - tensor 231: llama_model_loader: - tensor 232: llama_model_loader: - tensor 233: llama_model_loader: - tensor 234: llama_model_loader: - tensor 235: llama_model_loader: - tensor 236: llama_model_loader: - tensor 237: llama_model_loader: - tensor 238: llama_model_loader: - tensor 239: llama_model_loader: - tensor 240: llama_model_loader: - tensor 241: llama_model_loader: - tensor 242: llama_model_loader: - tensor 243: llama_model_loader: - tensor 244: llama_model_loader: - tensor 245: llama_model_loader: - tensor 246: llama_model_loader: - tensor 247: llama_model_loader: - tensor 248: llama_model_loader: - tensor 249: llama_model_loader: - tensor 250: llama_model_loader: - tensor 251: llama_model_loader: - tensor 252: llama_model_loader: - tensor 253: llama_model_loader: - tensor 254: llama_model_loader: - tensor 255: llama_model_loader: - tensor 256: llama_model_loader: - tensor 257: llama_model_loader: - tensor 258: llama_model_loader: - tensor 259: llama_model_loader: - tensor 260: llama_model_loader: - tensor 261: llama_model_loader: - tensor 262: llama_model_loader: - tensor 263: llama_model_loader: - tensor 264: llama_model_loader: - tensor 265: llama_model_loader: - tensor 266: llama_model_loader: - tensor 267: llama_model_loader: - tensor 268: llama_model_loader: - tensor 269: llama_model_loader: - tensor 270: llama_model_loader: - tensor 271: llama_model_loader: - tensor 272: llama_model_loader: - tensor 273: llama_model_loader: - tensor 274: llama_model_loader: - tensor 275: llama_model_loader: - tensor 276: llama_model_loader: - tensor 277: llama_model_loader: - tensor 278: llama_model_loader: - tensor 279: llama_model_loader: - tensor 280: llama_model_loader: - tensor 281: llama_model_loader: - tensor 282: llama_model_loader: - tensor 283: llama_model_loader: - tensor 284: llama_model_loader: - tensor 285: llama_model_loader: - tensor 286: llama_model_loader: - tensor 287: llama_model_loader: - tensor 288: llama_model_loader: - tensor 289: llama_model_loader: - tensor 290: llama_model_loader: - tensor 291: llama_model_loader: - tensor 292: llama_model_loader: - tensor 293: llama_model_loader: - tensor 294: llama_model_loader: - tensor 295: llama_model_loader: - tensor 296: llama_model_loader: - tensor 297: llama_model_loader: - tensor 298: llama_model_loader: - tensor 299: llama_model_loader: - tensor 300: llama_model_loader: - tensor 301: llama_model_loader: - tensor 302: llama_model_loader: - tensor 303: llama_model_loader: - tensor 304: llama_model_loader: - tensor 305: llama_model_loader: - tensor 306: llama_model_loader: - tensor 307: llama_model_loader: - tensor 308: llama_model_loader: - tensor 309: llama_model_loader: - tensor 310: llama_model_loader: - tensor 311: llama_model_loader: - tensor 312: llama_model_loader: - tensor 313: llama_model_loader: - tensor 314: llama_model_loader: - tensor 315: llama_model_loader: - tensor 316: llama_model_loader: - tensor 317: llama_model_loader: - tensor 318: llama_model_loader: - tensor 319: llama_model_loader: - tensor 320: llama_model_loader: - tensor 321: llama_model_loader: - tensor 322: llama_model_loader: - tensor 323: llama_model_loader: - tensor 324: llama_model_loader: - tensor 325: llama_model_loader: - tensor 326: llama_model_loader: - tensor 327: llama_model_loader: - tensor 328: llama_model_loader: - tensor 329: llama_model_loader: - tensor 330: llama_model_loader: - tensor 331: llama_model_loader: - tensor 332: llama_model_loader: - tensor 333: llama_model_loader: - tensor 334: llama_model_loader: - tensor 335: llama_model_loader: - tensor 336: llama_model_loader: - tensor 337: llama_model_loader: - tensor 338: llama_model_loader: - tensor 339: llama_model_loader: - tensor 340: llama_model_loader: - tensor 341: llama_model_loader: - tensor 342: llama_model_loader: - tensor 343: llama_model_loader: - tensor 344: llama_model_loader: - tensor 345: llama_model_loader: - tensor 346: llama_model_loader: - tensor 347: llama_model_loader: - tensor 348: llama_model_loader: - tensor 349: llama_model_loader: - tensor 350: llama_model_loader: - tensor 351: llama_model_loader: - tensor 352: llama_model_loader: - tensor 353: llama_model_loader: - tensor 354: llama_model_loader: - tensor 355: llama_model_loader: - tensor 356: llama_model_loader: - tensor 357: llama_model_loader: - tensor 358: llama_model_loader: - tensor 359: llama_model_loader: - tensor 360: llama_model_loader: - tensor 361: llama_model_loader: - tensor 362: llama_model_loader: - kv 0: llama_model_loader: - kv 1: llama_model_loader: - kv 2: llama_model_loader: - kv 3: llama_model_loader: - kv 4: llama_model_loader: - kv 5: llama_model_loader: - kv 6: llama_model_loader: - kv 7: llama_model_loader: - kv 8: llama_model_loader: - kv 9: llama_model_loader: - kv 10: llama_model_loader: - kv 11: llama_model_loader: - kv 12: llama_model_loader: - kv 13: llama_model_loader: - kv 14: llama_model_loader: - kv 15: llama_model_loader: - kv 16: llama_model_loader: - kv 17: llama_model_loader: - kv 18: llama_model_loader: - type f32: llama_model_loader: - type q4_0: llama_model_loader: - type q6_K: llm_load_print_meta: format llm_load_print_meta: arch llm_load_print_meta: vocab type llm_load_print_meta: n_vocab llm_load_print_meta: n_merges llm_load_print_meta: n_ctx_train llm_load_print_meta: n_embd llm_load_print_meta: n_head llm_load_print_meta: n_head_kv llm_load_print_meta: n_layer llm_load_print_meta: n_rot llm_load_print_meta: n_gqa llm_load_print_meta: f_norm_eps llm_load_print_meta: f_norm_rms_eps llm_load_print_meta: n_ff llm_load_print_meta: freq_base_train llm_load_print_meta: freq_scale_train llm_load_print_meta: model type llm_load_print_meta: model ftype llm_load_print_meta: model params llm_load_print_meta: model size llm_load_print_meta: general.name llm_load_print_meta: BOS llm_load_print_meta: EOS llm_load_print_meta: UNK llm_load_print_meta: LF token llm_load_tensors: ggml ctx size = llm_load_tensors: mem required .................................. llama_new_context_with_model: n_ctx llama_new_context_with_model: freq_base llama_new_context_with_model: llama_new_context_with_model: kv self size llama_new_context_with_model: AVX = 1 | AVX2 = 1 | AVX512 meta data with 19 key-value pairs and 363 tensors from C:\Users\dengh\AppData\Local\llama_index\models\llama-2-13b-chat.Q4_0.gguf (version GGUF V2 (latest)) token_embd.weight q4_0 [ 5120, 32000, 1, 1 ] blk.0.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.0.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.0.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.0.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.0.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.0.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.0.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.0.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.0.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.1.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.1.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.1.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.1.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.1.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.1.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.1.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.1.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.1.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.10.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.10.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.10.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.10.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.10.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.10.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.10.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.10.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.10.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.11.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.11.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.11.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.11.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.11.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.11.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.11.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.11.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.11.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.12.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.12.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.12.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.12.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.12.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.12.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.12.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.12.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.12.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.13.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.13.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.13.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.13.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.13.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.13.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.13.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.13.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.13.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.14.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.14.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.14.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.14.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.14.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.14.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.14.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.14.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.14.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.15.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.15.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.2.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.2.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.2.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.2.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.2.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.2.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.2.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.2.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.2.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.3.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.3.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.3.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.3.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.3.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.3.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.3.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.3.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.3.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.4.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.4.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.4.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.4.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.4.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.4.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.4.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.4.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.4.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.5.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.5.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.5.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.5.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.5.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.5.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.5.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.5.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.5.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.6.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.6.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.6.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.6.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.6.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.6.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.6.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.6.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.6.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.7.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.7.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.7.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.7.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.7.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.7.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.7.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.7.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.7.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.8.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.8.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.8.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.8.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.8.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.8.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.8.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.8.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.8.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.9.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.9.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.9.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.9.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.9.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.9.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.9.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.9.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.9.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.15.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.15.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.15.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.15.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.15.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.15.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.15.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.16.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.16.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.16.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.16.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.16.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.16.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.16.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.16.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.16.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.17.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.17.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.17.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.17.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.17.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.17.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.17.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.17.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.17.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.18.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.18.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.18.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.18.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.18.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.18.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.18.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.18.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.18.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.19.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.19.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.19.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.19.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.19.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.19.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.19.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.19.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.19.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.20.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.20.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.20.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.20.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.20.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.20.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.20.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.20.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.20.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.21.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.21.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.21.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.21.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.21.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.21.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.21.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.21.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.21.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.22.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.22.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.22.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.22.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.22.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.22.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.22.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.22.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.22.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.23.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.23.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.23.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.23.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.23.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.23.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.23.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.23.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.23.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.24.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.24.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.24.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.24.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.24.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.24.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.24.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.24.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.24.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.25.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.25.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.25.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.25.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.25.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.25.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.25.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.25.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.25.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.26.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.26.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.26.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.26.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.26.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.26.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.26.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.26.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.26.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.27.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.27.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.27.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.27.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.27.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.27.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.27.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.27.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.27.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.28.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.28.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.28.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.28.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.28.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.28.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.28.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.28.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.28.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.29.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.29.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.29.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.29.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.29.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.29.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.29.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.29.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.29.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.30.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.30.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.30.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.30.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.30.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.30.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] output.weight q6_K [ 5120, 32000, 1, 1 ] blk.30.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.30.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.30.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.31.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.31.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.31.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.31.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.31.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.31.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.31.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.31.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.31.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.32.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.32.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.32.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.32.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.32.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.32.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.32.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.32.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.32.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.33.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.33.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.33.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.33.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.33.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.33.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.33.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.33.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.33.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.34.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.34.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.34.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.34.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.34.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.34.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.34.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.34.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.34.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.35.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.35.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.35.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.35.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.35.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.35.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.35.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.35.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.35.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.36.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.36.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.36.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.36.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.36.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.36.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.36.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.36.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.36.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.37.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.37.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.37.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.37.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.37.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.37.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.37.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.37.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.37.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.38.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.38.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.38.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.38.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.38.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.38.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.38.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.38.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.38.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] blk.39.attn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.39.ffn_down.weight q4_0 [ 13824, 5120, 1, 1 ] blk.39.ffn_gate.weight q4_0 [ 5120, 13824, 1, 1 ] blk.39.ffn_up.weight q4_0 [ 5120, 13824, 1, 1 ] blk.39.ffn_norm.weight f32 [ 5120, 1, 1, 1 ] blk.39.attn_k.weight q4_0 [ 5120, 5120, 1, 1 ] blk.39.attn_output.weight q4_0 [ 5120, 5120, 1, 1 ] blk.39.attn_q.weight q4_0 [ 5120, 5120, 1, 1 ] blk.39.attn_v.weight q4_0 [ 5120, 5120, 1, 1 ] output_norm.weight f32 [ 5120, 1, 1, 1 ] general.architecture str general.name str llama.context_length u32 llama.embedding_length u32 llama.block_count u32 llama.feed_forward_length u32 llama.rope.dimension_count u32 llama.attention.head_count u32 llama.attention.head_count_kv u32 llama.attention.layer_norm_rms_epsilon f32 general.file_type u32 tokenizer.ggml.model str tokenizer.ggml.tokens arr tokenizer.ggml.scores arr tokenizer.ggml.token_type arr tokenizer.ggml.bos_token_id u32 tokenizer.ggml.eos_token_id u32 tokenizer.ggml.unknown_token_id u32 general.quantization_version u32 81 tensors 281 tensors 1 tensors = GGUF V2 (latest) = llama = SPM = 32000 = 0 = 4096 = 5120 = 40 = 40 = 40 = 128 = 1 = 0.0e+00 = 1.0e-05 = 13824 = 10000.0 = 1 = 13B = mostly Q4_0 = 13.02 B = 6.86 GiB (4.53 BPW) = LLaMA v2 token = 1 '~~' token = 2 '~~' token = 0 '' = 13 '<0x0A>' 0.12 MB = 7024.01 MB ................................................................. = 3900 = 10000.0 freq_scale = 1 = 3046.88 MB compute buffer total size = 348.18 MB = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |

Could not load OpenAIEmbedding. Using HuggingFaceBgeEmbeddings with model_name=BAAI/bge-small-en. If you intended to use OpenAI, please check your OPENAI_API_KEY. Original error: No API key found for OpenAI. Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization. API keys can be found or created at https://platform.openai.com/account/api-keys PS D:\Desktop\llama> `

When I tested the test case, I did not get the desired return value, but the program terminated after a period of time, with no return value and no exception message

dosubot[bot] commented 11 months ago

🤖

Hello,

Based on the information you've provided, it seems like the issue you're encountering is due to a missing OpenAI API key. The OpenAI API key is required to load the OpenAI model. If the key is not found, the system defaults to using LlamaCPP=llama2-13b-chat.

To resolve this issue, you need to set the OpenAI API key. You can do this in two ways:

Set the OPENAI_API_KEY environment variable:

import os
os.environ["OPENAI_API_KEY"] = "your-api-key"

Set the openai.api_key prior to initialization:

import openai
openai.api_key = 'your-api-key'

Replace "your-api-key" with your actual OpenAI API key. You can find or create your API keys at https://platform.openai.com/account/api-keys.

In addition, you need to pass the service context to the GPTVectorStoreIndex. Here's the corrected code:

#  /server/llama_index_server.py

from flask import Flask, request, Response
from flask_cors import CORS
from dotenv import load_dotenv
import os
import pandas as pd
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
from llama_index import Document
from langchain.chat_models import ChatOpenAI
from llama_index import LLMPredictor
from llama_index import ServiceContext

app = Flask(__name__)
CORS(app)

load_dotenv()

# Get the API key from the environment variable
import openai
api_key = os.getenv('OPENAI_API_KEY')
openai.api_key = api_key

# Loading documents from an Excel file
df = pd.read_excel('data/SupplierSample.xlsx')

# Convert DataFrame rows into documents
# Concatenate all column values into a single string for each row
documents = [Document(text=' '.join(map(str, row.values))) for _, row in df.iterrows()]

# LLM that supports streaming
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", streaming=True))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

# Construct a simple vector index
index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)

# Configure query engine to use streaming
query_engine = index.as_query_engine(streaming=True, similarity_top_k=1)

@app.route('/api/query', methods=['POST'])
def query():
    # Get the payload from the request
    payload = request.json

    # Get the question from the messages in the payload
    question = [m['content'] for m in payload['messages'] if m['role'] == 'user'][-1]

    # Now, query returns a StreamingResponse object
    streaming_response = query_engine.query(question)

    def response_stream():
        for text in streaming_response.response_gen:
            yield text + "\n"

    return Response(response_stream(), mimetype="text/event-stream")

if __name__ == '__main__':
    # Start the server, to run this script use "python llama_index_server.py" in terminal
    app.run(port=5000)

This solution was suggested in a similar issue: [Question]: AuthenticationError: No API key provided.

If you're still encountering issues, please provide more details so I can assist you further.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

denghao1120 commented 11 months ago

🤖

Hello,

Based on the information you've provided, it seems like the issue you're encountering is due to a missing OpenAI API key. The OpenAI API key is required to load the OpenAI model. If the key is not found, the system defaults to using LlamaCPP=llama2-13b-chat.

To resolve this issue, you need to set the OpenAI API key. You can do this in two ways:

Set the OPENAI_API_KEY environment variable:
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
Set the openai.api_key prior to initialization:
import openai
openai.api_key = 'your-api-key'
Replace "your-api-key" with your actual OpenAI API key. You can find or create your API keys at https://platform.openai.com/account/api-keys.

In addition, you need to pass the service context to the GPTVectorStoreIndex. Here's the corrected code:
#  /server/llama_index_server.py

from flask import Flask, request, Response
from flask_cors import CORS
from dotenv import load_dotenv
import os
import pandas as pd
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
from llama_index import Document
from langchain.chat_models import ChatOpenAI
from llama_index import LLMPredictor
from llama_index import ServiceContext

app = Flask(__name__)
CORS(app)

load_dotenv()

# Get the API key from the environment variable
import openai
api_key = os.getenv('OPENAI_API_KEY')
openai.api_key = api_key

# Loading documents from an Excel file
df = pd.read_excel('data/SupplierSample.xlsx')

# Convert DataFrame rows into documents
# Concatenate all column values into a single string for each row
documents = [Document(text=' '.join(map(str, row.values))) for _, row in df.iterrows()]

# LLM that supports streaming
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", streaming=True))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

# Construct a simple vector index
index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)

# Configure query engine to use streaming
query_engine = index.as_query_engine(streaming=True, similarity_top_k=1)

@app.route('/api/query', methods=['POST'])
def query():
    # Get the payload from the request
    payload = request.json

    # Get the question from the messages in the payload
    question = [m['content'] for m in payload['messages'] if m['role'] == 'user'][-1]

    # Now, query returns a StreamingResponse object
    streaming_response = query_engine.query(question)

    def response_stream():
        for text in streaming_response.response_gen:
            yield text + "\n"

    return Response(response_stream(), mimetype="text/event-stream")

if __name__ == '__main__':
    # Start the server, to run this script use "python llama_index_server.py" in terminal
    app.run(port=5000)
This solution was suggested in a similar issue: [Question]: AuthenticationError: No API key provided.

If you're still encountering issues, please provide more details so I can assist you further.

Sources

[Question]: AuthenticationError: No API key provided

[Bug]: The _get_llm_metadata function reports an error: "Did not find openai_api_key, please add an environment variable OPENAI_API_KEY which contains it, or pass openai_api_key as a named parameter." (type=value_error)

llama_index/llms/openai_utils.py

tests/llms/test_openai.py

docs/examples/llm/llama_2_llama_cpp.ipynb

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Thank you for your reply But now I want to test this demo without using openai key, how do I do it？（I can't get the openai key right now）

dosubot[bot] commented 11 months ago

🤖

Hello,

Based on the information you've provided and the similar issues I found in the LlamaIndex repository, it seems like you've correctly set up the OpenAI API key and passed the service context to the GPTVectorStoreIndex. However, the error might be due to a few reasons:

Incorrect API key: Ensure that the API key you have provided is correct and has the necessary permissions.
Incorrect usage of OpenAIEmbedding: Make sure that you are using the OpenAIEmbedding correctly.
Global service context: You have set a global service context using ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo-0613")). Ensure that this is done correctly and the service context is accessible where it's needed.
Missing model deployment: The OpenAIEmbedding requires a deployed model. Make sure you have deployed your model correctly and it's accessible.
Incorrect chat mode: In your code, you have set the chat mode to "openai". Ensure that this mode is supported and correctly implemented in your application.
Incorrect function call: You have used the function call "query_engine_tool" in your chat request. Make sure that this function call is correctly implemented and supported by your application.

If you have checked all these points and the error still persists, please provide the exact error message for further assistance.

Also, I noticed that you mentioned you're using LlamaIndex version 0.8.43. However, the latest version I'm aware of is 0.8.42. Could you please confirm the version you're using? This might help in diagnosing the issue.

For more information, you can refer to the following examples in the LlamaIndex repository:

I hope this helps! Let me know if you have any other questions or if there's anything else I can assist you with.

Best, Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich commented 11 months ago

@denghao1120 you likely dont have enough system RAM to run the local fallback LLM

Do you have access to openai? Or maybe try using another LLM

run-llama / llama_index