Closed rohanrichards closed 7 months ago
Hello, yes getting the same issue. Python 3.10.11, Windows 10 pro
In the .env file my model type is MODEL_TYPE=GPT4All
after running the ingest.py file, I run the privateGPT.py script, at the prompt I enter the the text: what can you tell me about the state of the union address, and I get the following
gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token '' gpt_tokenize: unknown token ''
Help is appreciated. Thank you
To be clear as well I do eventually get output, its just taking an extremely long time, I'm thinking these messages are actually just a warning and its working as intended, albeit extremely slowly for some reason.
To be clear as well I do eventually get output, its just taking an extremely long time, I'm thinking these messages are actually just a warning and its working as intended, albeit extremely slowly for some reason.
let me give it more time, I am waiting 10 minutes, and nothing happens after those warning messages, I will wait a bit longer and see what happens
you are right, I get the response but it is very slow, and uses up around 18/19gb of memory, running another query gives me this memory related error:
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 8264657744, available 8257513008) Process finished with exit code -1073741819 (0xC0000005)
I have a very similar issue. But I get: gpt_tokenize: unknown token '?' (That just keeps repeating)
first time I did use a question mark so I exited and tried again without it, same error.
intel iMac, python 3 Tried with the provided State of the Union text, so not my own file.
Here is my output on Windows with default data
gpt_tokenize: unknown token 'Ô' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' gpt_tokenize: unknown token 'Ô' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' gpt_tokenize: unknown token 'Ô' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö'
I am getting similar results to @ernestp
Enter a query: who gave the state of the union address speech in 2023?
gpt_tokenize: unknown token 'Γ'
gpt_tokenize: unknown token 'Ç'
gpt_tokenize: unknown token 'Ö'
gpt_tokenize: unknown token 'Γ'
gpt_tokenize: unknown token 'Ç'
gpt_tokenize: unknown token 'Ö'
gpt_tokenize: unknown token 'Γ'
gpt_tokenize: unknown token 'Ç'
gpt_tokenize: unknown token 'Ö'
gpt_tokenize: unknown token 'Γ'
gpt_tokenize: unknown token 'Ç'
gpt_tokenize: unknown token 'Ö'
gpt_tokenize: unknown token 'Γ'
gpt_tokenize: unknown token 'Ç'
gpt_tokenize: unknown token 'Ö'
Same issue...has anyone managed to fix that?
Same issue, after 5-10 min i get the response for a simple query..
Pc:AMD Ryzen 5 5600X, 32GB RAM, GPU:Nvida GTX 1060 6GB
This seems the same as https://github.com/imartinez/privateGPT/issues/13 and https://github.com/imartinez/privateGPT/issues/107
I think it would be better to keep only one issue open, otherwise it just makes it harder to debug with info spread all over multiple issues.
Has a root cause or solution been found? Just tried it last night and I get the string of '?' Unknown symbol errors. :(
Has a root cause or solution been found? Just tried it last night and I get the string of '?' Unknown symbol errors. :(
You can ignore them. There will be output.
The process ends with "Killed" every time.
type GPT4All path ggml-gpt4all-j-v1.3-groovy.bin ctx 1000 Windows 10
python --version Python 3.10.6
which python ~/miniconda3/envs/privateGPT/bin/python
pip list
Package Version
aiohttp 3.8.4 aiosignal 1.3.1 alacritty-colorscheme 1.0.1 anyio 3.6.2 argilla 1.7.0 async-timeout 4.0.2 asyncio 3.4.3 attrs 23.1.0 backoff 2.2.1 beautifulsoup4 4.12.2 bracex 2.3.post1 certifi 2023.5.7 cffi 1.15.1 chardet 5.1.0 charset-normalizer 3.1.0 chromadb 0.3.23 click 8.1.3 clickhouse-connect 0.5.25 cmake 3.26.3 colorclass 2.2.2 commonmark 0.9.1 compressed-rtf 1.0.6 cryptography 40.0.2 dataclasses-json 0.5.7 Deprecated 1.2.13 duckdb 0.8.0 easygui 0.98.3 ebcdic 1.1.1 et-xmlfile 1.1.0 extract-msg 0.41.1 fastapi 0.95.2 filelock 3.12.0 frozenlist 1.3.3 fsspec 2023.5.0 ghp-import 2.1.0 gpt4all 0.2.3 greenlet 1.1.3.post0 h11 0.14.0 hnswlib 0.7.0 httpcore 0.16.3 httptools 0.5.0 httpx 0.23.3 huggingface-hub 0.14.1 idna 3.4 IMAPClient 2.3.1 Jinja2 3.1.2 joblib 1.2.0 jq 1.4.1 langchain 0.0.171 lark-parser 0.12.0 lit 16.0.5 llama-cpp-python 0.1.49 lxml 4.9.2 lz4 4.3.2 Markdown 3.3.7 MarkupSafe 2.1.2 marshmallow 3.19.0 marshmallow-enum 1.5.1 mergedeep 1.3.4 monotonic 1.6 mpmath 1.3.0 msg-parser 1.2.0 msgpack 1.0.4 msoffcrypto-tool 5.0.1 multidict 6.0.4 mypy-extensions 0.4.3 natsort 8.1.0 networkx 3.1 nltk 3.8.1 numexpr 2.8.4 numpy 1.23.5 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 olefile 0.46 oletools 0.60.1 openapi-schema-pydantic 1.2.4 openpyxl 3.1.2 packaging 23.1 pandas 1.5.3 pandoc 2.3 pcodedmp 1.2.6 pdfminer.six 20221105 Pillow 9.5.0 pip 23.0.1 plumbum 1.8.1 ply 3.11 posthog 3.0.1 pycparser 2.21 pydantic 1.10.8 Pygments 2.12.0 pygpt4all 1.1.0 pygptj 2.0.3 pyllamacpp 2.3.0 pymdown-extensions 9.5 pynvim 0.4.3 pypandoc 1.11 pyparsing 2.4.7 python-dateutil 2.8.2 python-docx 0.8.11 python-dotenv 1.0.0 python-magic 0.4.27 python-pptx 0.6.21 python-slugify 6.1.2 pytz 2023.3 pytz-deprecation-shim 0.1.0.post0 PyYAML 6.0 pyyaml_env_tag 0.1 red-black-tree-mod 1.20 regex 2023.5.5 requests 2.31.0 rfc3986 1.5.0 rich 13.0.1 RTFDE 0.0.2 ruamel.yaml 0.16.13 scikit-learn 1.2.2 scipy 1.10.1 sentence-transformers 2.2.2 sentencepiece 0.1.99 setuptools 66.0.0 six 1.16.0 sniffio 1.3.0 soupsieve 2.4.1 SQLAlchemy 2.0.15 starlette 0.27.0 sympy 1.12 tabulate 0.9.0 tenacity 8.2.2 termcolor 1.1.0 text-unidecode 1.3 threadpoolctl 3.1.0 tokenizers 0.13.3 torch 2.0.1 torchvision 0.15.2 tqdm 4.65.0 transformers 4.29.2 triton 2.0.0 typed-argument-parser 1.7.2 typer 0.9.0 typing_extensions 4.6.1 typing-inspect 0.8.0 tzdata 2023.3 tzlocal 4.2 unstructured 0.6.6 urllib3 2.0.2 uvicorn 0.22.0 uvloop 0.17.0 watchdog 2.1.9 watchfiles 0.19.0 wcmatch 8.4 websockets 11.0.3 wheel 0.38.4 wrapt 1.14.1 XlsxWriter 3.1.1 yarl 1.9.2 zstandard 0.21.0
conda list packages in environment at ~/miniconda3/envs/privateGPT:
Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
aiohttp 3.8.4 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
anyio 3.6.2 pypi_0 pypi
argilla 1.7.0 pypi_0 pypi
async-timeout 4.0.2 pypi_0 pypi
attrs 23.1.0 pypi_0 pypi
backoff 2.2.1 pypi_0 pypi
beautifulsoup4 4.12.2 pypi_0 pypi
bzip2 1.0.8 h7b6447c_0
ca-certificates 2023.01.10 h06a4308_0
certifi 2023.5.7 pypi_0 pypi
cffi 1.15.1 pypi_0 pypi
chardet 5.1.0 pypi_0 pypi
charset-normalizer 3.1.0 pypi_0 pypi
chromadb 0.3.23 pypi_0 pypi
click 8.1.3 pypi_0 pypi
clickhouse-connect 0.5.25 pypi_0 pypi
cmake 3.26.3 pypi_0 pypi
colorclass 2.2.2 pypi_0 pypi
commonmark 0.9.1 pypi_0 pypi
compressed-rtf 1.0.6 pypi_0 pypi
cryptography 40.0.2 pypi_0 pypi
dataclasses-json 0.5.7 pypi_0 pypi
deprecated 1.2.13 pypi_0 pypi
duckdb 0.8.0 pypi_0 pypi
easygui 0.98.3 pypi_0 pypi
ebcdic 1.1.1 pypi_0 pypi
et-xmlfile 1.1.0 pypi_0 pypi
extract-msg 0.41.1 pypi_0 pypi
fastapi 0.95.2 pypi_0 pypi
filelock 3.12.0 pypi_0 pypi
frozenlist 1.3.3 pypi_0 pypi
fsspec 2023.5.0 pypi_0 pypi
gpt4all 0.2.3 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
hnswlib 0.7.0 pypi_0 pypi
httpcore 0.16.3 pypi_0 pypi
httptools 0.5.0 pypi_0 pypi
httpx 0.23.3 pypi_0 pypi
huggingface-hub 0.14.1 pypi_0 pypi
idna 3.4 pypi_0 pypi
imapclient 2.3.1 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
joblib 1.2.0 pypi_0 pypi
jq 1.4.1 pypi_0 pypi
langchain 0.0.171 pypi_0 pypi
lark-parser 0.12.0 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
lit 16.0.5 pypi_0 pypi
llama-cpp-python 0.1.49 pypi_0 pypi
lxml 4.9.2 pypi_0 pypi
lz4 4.3.2 pypi_0 pypi
markupsafe 2.1.2 pypi_0 pypi
marshmallow 3.19.0 pypi_0 pypi
marshmallow-enum 1.5.1 pypi_0 pypi
monotonic 1.6 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
msg-parser 1.2.0 pypi_0 pypi
msoffcrypto-tool 5.0.1 pypi_0 pypi
multidict 6.0.4 pypi_0 pypi
ncurses 6.4 h6a678d5_0
networkx 3.1 pypi_0 pypi
nltk 3.8.1 pypi_0 pypi
numexpr 2.8.4 pypi_0 pypi
numpy 1.23.5 pypi_0 pypi
nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi
nvidia-cuda-cupti-cu11 11.7.101 pypi_0 pypi
nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi
nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi
nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi
nvidia-cufft-cu11 10.9.0.58 pypi_0 pypi
nvidia-curand-cu11 10.2.10.91 pypi_0 pypi
nvidia-cusolver-cu11 11.4.0.1 pypi_0 pypi
nvidia-cusparse-cu11 11.7.4.91 pypi_0 pypi
nvidia-nccl-cu11 2.14.3 pypi_0 pypi
nvidia-nvtx-cu11 11.7.91 pypi_0 pypi
olefile 0.46 pypi_0 pypi
oletools 0.60.1 pypi_0 pypi
openapi-schema-pydantic 1.2.4 pypi_0 pypi
openpyxl 3.1.2 pypi_0 pypi
openssl 1.1.1t h7f8727e_0
packaging 23.1 pypi_0 pypi
pandas 1.5.3 pypi_0 pypi
pandoc 2.3 pypi_0 pypi
pcodedmp 1.2.6 pypi_0 pypi
pdfminer-six 20221105 pypi_0 pypi
pillow 9.5.0 pypi_0 pypi
pip 23.0.1 py310h06a4308_0
plumbum 1.8.1 pypi_0 pypi
ply 3.11 pypi_0 pypi
posthog 3.0.1 pypi_0 pypi
pycparser 2.21 pypi_0 pypi
pydantic 1.10.8 pypi_0 pypi
pygpt4all 1.1.0 pypi_0 pypi
pygptj 2.0.3 pypi_0 pypi
pyllamacpp 2.3.0 pypi_0 pypi
pypandoc 1.11 pypi_0 pypi
pyparsing 2.4.7 pypi_0 pypi
python 3.10.11 h7a1cb2a_2
python-docx 0.8.11 pypi_0 pypi
python-dotenv 1.0.0 pypi_0 pypi
python-magic 0.4.27 pypi_0 pypi
python-pptx 0.6.21 pypi_0 pypi
pytz 2023.3 pypi_0 pypi
pytz-deprecation-shim 0.1.0.post0 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 8.2 h5eee18b_0
red-black-tree-mod 1.20 pypi_0 pypi
regex 2023.5.5 pypi_0 pypi
requests 2.31.0 pypi_0 pypi
rfc3986 1.5.0 pypi_0 pypi
rich 13.0.1 pypi_0 pypi
rtfde 0.0.2 pypi_0 pypi
scikit-learn 1.2.2 pypi_0 pypi
scipy 1.10.1 pypi_0 pypi
sentence-transformers 2.2.2 pypi_0 pypi
sentencepiece 0.1.99 pypi_0 pypi
setuptools 66.0.0 py310h06a4308_0
six 1.16.0 pypi_0 pypi
sniffio 1.3.0 pypi_0 pypi
soupsieve 2.4.1 pypi_0 pypi
sqlalchemy 2.0.15 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0
starlette 0.27.0 pypi_0 pypi
sympy 1.12 pypi_0 pypi
tabulate 0.9.0 pypi_0 pypi
tenacity 8.2.2 pypi_0 pypi
threadpoolctl 3.1.0 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tokenizers 0.13.3 pypi_0 pypi
torch 2.0.1 pypi_0 pypi
torchvision 0.15.2 pypi_0 pypi
tqdm 4.65.0 pypi_0 pypi
transformers 4.29.2 pypi_0 pypi
triton 2.0.0 pypi_0 pypi
typer 0.9.0 pypi_0 pypi
typing-extensions 4.6.1 pypi_0 pypi
tzdata 2023.3 pypi_0 pypi
tzlocal 4.2 pypi_0 pypi
unstructured 0.6.6 pypi_0 pypi
urllib3 2.0.2 pypi_0 pypi
uvicorn 0.22.0 pypi_0 pypi
uvloop 0.17.0 pypi_0 pypi
watchfiles 0.19.0 pypi_0 pypi
websockets 11.0.3 pypi_0 pypi
wheel 0.38.4 py310h06a4308_0
wrapt 1.14.1 pypi_0 pypi
xlsxwriter 3.1.1 pypi_0 pypi
xz 5.4.2 h5eee18b_0
yarl 1.9.2 pypi_0 pypi
zlib 1.2.13 h5eee18b_0
zstandard 0.21.0 pypi_0 pypi
Using embedded DuckDB with persistence: data will be stored in: db gptj_model_load: loading model from 'ggml-gpt4all-j-v1.3-groovy.bin' - please wait ... gptj_model_load: n_vocab = 50400 gptj_model_load: n_ctx = 2048 gptj_model_load: n_embd = 4096 gptj_model_load: n_head = 16 gptj_model_load: n_layer = 28 gptj_model_load: n_rot = 64 gptj_model_load: f16 = 2 gptj_model_load: ggml ctx size = 4505.45 MB gptj_model_load: memory_size = 896.00 MB, n_mem = 57344 gptj_model_load: ................................... done gptj_model_load: model size = 3609.38 MB / num tensors = 285
Enter a query: Is this a test? gpt_tokenize: unknown token '�' gpt_tokenize: unknown token '�' ... gpt_tokenize: unknown token '�' Killed
Windows 10, python 3.10 after ingesting and writing my first prompt "what can you tell me about the state of the union address" I get the following output, followed by an extremely long wait where it uses ~30% of CPU and RAM continues to increase: