langflow-ai / langflow

Langflow is a low-code app builder for RAG and multi-agent AI applications. It’s Python-based and agnostic to any model, API, or database.
http://www.langflow.org
MIT License
28.7k stars 3.68k forks source link

Add support for llama.cpp #144

Closed harborwater closed 1 year ago

harborwater commented 1 year ago

add support for llama.ccp for local Ai inferencing

ogabrielluiz commented 1 year ago

134 added that but we haven't released it yet because I was not able to test it yet.

Do you think you could test it using the dev branch?

lolxdmainkaisemaanlu commented 1 year ago

Langflow stays stuck on ' thinking ' even after 5 minutes.. with the latest 0.56 build.. Also idk why it unsucessfully runs llama.cpp two times and gets stuck on the third time?

siddhesh@desktop:~/Desktop$ langflow [16:39:52] INFO [16:39:52] - INFO - Logger set up with log level: 20(info) logger.py:28 INFO [16:39:52] - INFO - Log file: logs/langflow.log logger.py:30 [2023-04-14 16:39:52 +0530] [12703] [INFO] Starting gunicorn 20.1.0 [2023-04-14 16:39:52 +0530] [12703] [INFO] Listening at: http://127.0.0.1:7860 (12703) [2023-04-14 16:39:52 +0530] [12703] [INFO] Using worker: uvicorn.workers.UvicornWorker [2023-04-14 16:39:52 +0530] [12715] [INFO] Booting worker with pid: 12715 [2023-04-14 16:39:52 +0530] [12715] [INFO] Started server process [12715] [2023-04-14 16:39:52 +0530] [12715] [INFO] Waiting for application startup. [2023-04-14 16:39:52 +0530] [12715] [INFO] Application startup complete. llama_model_load: loading model from '/home/siddhesh/Desktop/vicuna.bin' - please wait ... llama_model_load: n_vocab = 32001 llama_model_load: n_ctx = 512 llama_model_load: n_embd = 5120 llama_model_load: n_mult = 256 llama_model_load: n_head = 40 llama_model_load: n_layer = 40 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 13824 llama_model_load: n_parts = 2 llama_model_load: type = 2 llama_model_load: ggml map size = 7759.84 MB llama_model_load: ggml ctx size = 101.25 KB llama_model_load: mem required = 9807.93 MB (+ 3216.00 MB per state) llama_model_load: loading tensors from '/home/siddhesh/Desktop/vicuna.bin' llama_model_load: model size = 7759.40 MB / num tensors = 363 llama_init_from_file: kv self size = 800.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | llama_model_load: loading model from '/home/siddhesh/Desktop/vicuna.bin' - please wait ... llama_model_load: n_vocab = 32001 llama_model_load: n_ctx = 512 llama_model_load: n_embd = 5120 llama_model_load: n_mult = 256 llama_model_load: n_head = 40 llama_model_load: n_layer = 40 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 13824 llama_model_load: n_parts = 2 llama_model_load: type = 2 llama_model_load: ggml map size = 7759.84 MB llama_model_load: ggml ctx size = 101.25 KB llama_model_load: mem required = 9807.93 MB (+ 3216.00 MB per state) llama_model_load: loading tensors from '/home/siddhesh/Desktop/vicuna.bin' llama_model_load: model size = 7759.40 MB / num tensors = 363 llama_init_from_file: kv self size = 800.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | llama_model_load: loading model from '/home/siddhesh/Desktop/vicuna.bin' - please wait ... llama_model_load: n_vocab = 32001 llama_model_load: n_ctx = 512 llama_model_load: n_embd = 5120 llama_model_load: n_mult = 256 llama_model_load: n_head = 40 llama_model_load: n_layer = 40 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 13824 llama_model_load: n_parts = 2 llama_model_load: type = 2 llama_model_load: ggml map size = 7759.84 MB llama_model_load: ggml ctx size = 101.25 KB llama_model_load: mem required = 9807.93 MB (+ 3216.00 MB per state) llama_model_load: loading tensors from '/home/siddhesh/Desktop/vicuna.bin' llama_model_load: model size = 7759.40 MB / num tensors = 363 llama_init_from_file: kv self size = 800.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

nsvrana commented 1 year ago

Mine behaves the same way, but it is not stuck, it just takes that long for it to execute for me.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

TaoAthe commented 1 year ago

Which model do I use for the llamaCPP LLM? I have tried several. Where is the documentation for using langflow?

ogabrielluiz commented 1 year ago

233 Could you try what I mentioned in this issue? It works here. We've released a new version that might help with this.

TaoAthe commented 1 year ago

Sorry I have been off trying to locate a GPU that is better. I will try the latest thank you for responding.

berradakamal commented 1 year ago

Does someone figured out how to run llama with langflow? I tried many approaches and I still am struggling, I have a model of llama-2-13b that I converted, build and quantized with llama.cpp. It's running well in llama (ggml-model-q4_0.gguf also tried ggml-vic7b-q4_0.bin). I created a models directory in root project and tried LlamaCpp and Ctransformers but I never got an response from the LLM.. Can someone please help me ?