su77ungr / CASALIOY

♾️ toolkit for air-gapped LLMs on consumer-grade hardware
Apache License 2.0
230 stars 31 forks source link

Downloading Models Each Run And Error Running GUI #63

Closed neeewwww closed 1 year ago

neeewwww commented 1 year ago

I ingested my new files using "python casalioy/ingest.py", it proceeded downloading sentence-transformers/all-MiniLM-L6-v2 from HF and eachadea/ggml-vicuna-7b-1.1 from HF. Processed the files and finished the routine.

Ran "streamlit run casalioy/gui.py" and it proceeded to download the models again.

Theres a way to check if the models already exists before downloading? Or I'm doing something wrong?

Using Main, without Docker - Python 3.11.3

D:\120hz\CASALIOY>python casalioy/ingest.py
Downloading sentence-transformers/all-MiniLM-L6-v2 from HF
Downloading (…)_Pooling/config.json: 100%|████████████████████████████████████████████████████| 190/190 [00:00<?, ?B/s]
Downloading (…)ce_transformers.json: 100%|████████████████████████████████████████████████████| 116/116 [00:00<?, ?B/s]
Downloading (…)nce_bert_config.json: 100%|██████████████████████████████████████████████████| 53.0/53.0 [00:00<?, ?B/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████| 112/112 [00:00<?, ?B/s]
Downloading (…)55de9125/config.json: 100%|████████████████████████████████████████████████████| 612/612 [00:00<?, ?B/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████| 350/350 [00:00<?, ?B/s]
Downloading (…)5de9125/modules.json: 100%|████████████████████████████████████████████████████| 349/349 [00:00<?, ?B/s]
Downloading (…)e9125/tokenizer.json: 100%|██████████████████████████████████████████| 466k/466k [00:00<00:00, 1.38MB/s]
Downloading (…)125/data_config.json: 100%|█████████████████████████████████████████| 39.3k/39.3k [00:00<00:00, 352kB/s]
Downloading pytorch_model.bin: 100%|██████████████████████████████████████████████| 90.9M/90.9M [00:03<00:00, 27.8MB/s]
Fetching 10 files: 100%|███████████████████████████████████████████████████████████████| 10/10 [00:04<00:00,  2.31it/s]
Downloading eachadea/ggml-vicuna-7b-1.1 from HF
Downloading ggml-vic7b-q5_1.bin: 100%|████████████████████████████████████████████| 5.06G/5.06G [02:27<00:00, 34.3MB/s]
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████| 1/1 [02:56<00:00, 176.97s/it]
Scanning files
Processing ren20211000.pdf
Processing 1828 chunks
Creating a new collection, size=384
Saving 1000 chunks
Saved, the collection now holds 1000 documents.
embedding chunk 1001/1828
Saving 828 chunks
Saved, the collection now holds 1828 documents.
Processed ren20211000.pdf
 100.0% [=======================================================================================>]   1/  1 eta [00:00]
Done

D:\120hz\CASALIOY>streamlit run casalioy/gui.py

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.15.9:8501

Downloading sentence-transformers/all-MiniLM-L6-v2 from HF
Downloading (…)55de9125/config.json: 100%|████████████████████████████████████████████████████| 612/612 [00:00<?, ?B/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████| 112/112 [00:00<00:00, 112kB/s]
Downloading (…)ce_transformers.json: 100%|████████████████████████████████████████████████████| 116/116 [00:00<?, ?B/s]
Downloading (…)_Pooling/config.json: 100%|████████████████████████████████████████████████████| 190/190 [00:00<?, ?B/s]
Downloading (…)5de9125/modules.json: 100%|████████████████████████████████████████████████████| 349/349 [00:00<?, ?B/s]
Downloading (…)125/data_config.json: 100%|█████████████████████████████████████████| 39.3k/39.3k [00:00<00:00, 350kB/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████| 350/350 [00:00<?, ?B/s]
Downloading (…)nce_bert_config.json: 100%|██████████████████████████████████████████████████| 53.0/53.0 [00:00<?, ?B/s]
Downloading (…)e9125/tokenizer.json: 100%|██████████████████████████████████████████| 466k/466k [00:00<00:00, 1.37MB/s]
Downloading pytorch_model.bin: 100%|██████████████████████████████████████████████| 90.9M/90.9M [00:04<00:00, 22.3MB/s]
Fetching 10 files: 100%|███████████████████████████████████████████████████████████████| 10/10 [00:05<00:00,  1.97it/s]
Downloading eachadea/ggml-vicuna-7b-1.1 from HF███████████████████████████████▍   | 83.9M/90.9M [00:03<00:00, 31.7MB/s]
Downloading ggml-vic7b-q5_1.bin: 100%|████████████████████████████████████████████| 5.06G/5.06G [02:39<00:00, 31.7MB/s]
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████| 1/1 [03:13<00:00, 193.51s/it]
2023-05-16 14:54:20.907 Uncaught app exception
Traceback (most recent call last):
  File "D:\Python311\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
  File "D:\120hz\CASALIOY\casalioy\gui.py", line 4, in <module>
    from load_env import get_embedding_model, model_n_ctx, model_path, model_stop, model_temp, n_gpu_layers, persist_directory, print_HTML, use_mlock
ImportError: cannot import name 'print_HTML' from 'load_env' (D:\120hz\CASALIOY\casalioy\load_env.py)

Thanks.

su77ungr commented 1 year ago

Jap noticed this myself. I'm on it

su77ungr commented 1 year ago

reverted it for now.

might be a truncated p = Path("models/"+path)

edit: fixed it. now resolving gui issue in com

edit: also fixed pr coming

hippalectryon-0 commented 1 year ago

Ran "streamlit run casalioy/gui.py" and it proceeded to download the models again.

Easy fix: just put the path to the local models once downloaded

Actual fix: simply check if the local model exists before asking HF, no need to revert :P

gui error

That one's very simple, the import is now in utils.py

su77ungr commented 1 year ago

65 guess you can edit this. but this also works

edit: thanks @hippalectryon-0 for the amazing progress lately. this repo rocks

edit: was missing predefined .env

hippalectryon-0 commented 1 year ago

67 for some hotfixes :)

@su77ungr also fixed example.env

su77ungr commented 1 year ago

fixed with #65, #67 #13cce0e514e03fc1e2d03f530ea64bb64b1f0cd9