wafflecomposite / langchain-ask-pdf-local

An AI-app that allows you to upload a PDF and ask questions about it. It uses StableVicuna 13B and runs locally.
86 stars 8 forks source link

use other pdf will raise error #5

Open alexhmyang opened 1 year ago

alexhmyang commented 1 year ago

File "/home/ubuntu/.local/lib/python3.8/site-packages/llama_cpp/llama.py", line 506, in _create_completion prompt_tokens: List[llama_cpp.llama_token] = self.tokenize( File "/home/ubuntu/.local/lib/python3.8/site-packages/llama_cpp/llama.py", line 189, in tokenize raise RuntimeError(f'Failed to tokenize: text="{text}" n_tokens={n_tokens}') RuntimeError: Failed to tokenize: text="b" ### Human:Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n\xe6\xa8\xaa\xe5\xba\x97\xe9\x9b\x86\xe5\x9b\xa2\xe4\xb8\x9c\xe7\xa3\x81\xe8\x82\xa1\xe4\xbb\xbd\xe6\x9c\x89\xe9\x99\x90\xe5\x85\xac\xe5\x8f\xb8 \n \n \n \n1

and use your pdf cannot generate answer or too slow to generate:

AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | llama.cpp: loading model from ./ggml-vicuna-13b-1.1-q4_2.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 5 (mostly Q4_2) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 85.08 KB llama_model_load_internal: mem required = 9807.48 MB (+ 1608.00 MB per state) llama_init_from_file: kv self size = 1600.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | Token indices sequence length is longer than the specified maximum sequence length for this model (1104 > 1024). Running this sequence through the model will result in indexing errors

always waiting here

wafflecomposite commented 1 year ago

Seems like the main problem is the exceeded context length. First, try to edit those lines in app.py:

Line 59: try lower values for chunk_size and chunk_overlap. Like 800 and 150, for example.

If that doesn't help:

Line 78: lower the k value from 4 to 3 (this is a number of retrieved text chunks)

By those logs I'm also assuming you are using Chinese. I haven't tested if it even works, and I expect the model to be even slower than usual with it, and the quality of the results probably be poor. Perhaps some other LLM models with more emphasis on multi-language or specifically Chinese would be better suited for this.