su77ungr / CASALIOY

♾️ toolkit for air-gapped LLMs on consumer-grade hardware
Apache License 2.0
230 stars 31 forks source link

Progress stuck in window 10 for "python casalioy/ingest.py source_documents/" #108

Closed madeepakkumar1 closed 1 year ago

madeepakkumar1 commented 1 year ago

.env

Generic

TEXT_EMBEDDINGS_MODEL=sentence-transformers/all-MiniLM-L6-v2 TEXT_EMBEDDINGS_MODEL_TYPE=HF # LlamaCpp or HF USE_MLOCK=false

Ingestion

PERSIST_DIRECTORY=db DOCUMENTS_DIRECTORY=source_documents INGEST_CHUNK_SIZE=500 INGEST_CHUNK_OVERLAP=50 INGEST_N_THREADS=3

Generation

MODEL_TYPE=LlamaCpp # GPT4All or LlamaCpp MODEL_PATH=eachadea/ggml-vicuna-7b-1.1/ggml-vic7b-q5_1.bin MODEL_TEMP=0.8 MODEL_N_CTX=1024 # Max total size of prompt+answer MODEL_MAX_TOKENS=256 # Max size of answer MODEL_STOP=[STOP] CHAIN_TYPE=betterstuff N_RETRIEVE_DOCUMENTS=100 # How many documents to retrieve from the db N_FORWARD_DOCUMENTS=100 # How many documents to forward to the LLM, chosen among those retrieved N_GPU_LAYERS=4

Python version

python3.10.11

System

Windows 10

CASALIOY version

main

Information

Related Components

Reproduction

$python casalioy/ingest.py source_documents/

Expected behavior

It should create vector embedding db

madeepakkumar1 commented 1 year ago

image

su77ungr commented 1 year ago

That's a new one. Did you try rerunning it. Maybe try it again with a cleaned source_documents directory and a test.pdf of something.

madeepakkumar1 commented 1 year ago

Thanks @su77ungr , It worked post deleting everything and just putting one .pdf file, My guess is .docx creating problem in windows