Improvement: Add llama.cpp package to be able to DL models and perform summarization as part of pipeline without external APIs

rmusser01 / tldw

Too Long, Didn't Watch(TL/DW): Your Personal Research Multi-Tool - Open Source NotebookLM

Apache License 2.0

45 stars 2 forks source link

Improvement: Add llama.cpp package to be able to DL models and perform summarization as part of pipeline without external APIs #23

Closed rmusser01 closed 1 month ago

rmusser01 commented 1 month ago

Use https://github.com/abetlen/llama-cpp-python

to download + run MS Phi3 128k Context model

when proper CLI args are passed.

Will alllow individuals to not need anything else besides the application (and some free space...) to perform inference without struggles.

rmusser01 commented 1 month ago

https://github.com/ddh0/easy-llama

rmusser01 commented 1 month ago

Seems llama.cpp-python aint it chief.

Let's look at bundling/downloading llama.cpp compiled for the host platform (and if CUDA/ROCM is available or not). That way, can bundle as a package + allow for updates + use Llama to download HF models.

That or llamafile, perhaps checking for existence of CUDA/ROCM drivers/HW, and if found, downloading the appropriate llama.cpp release, otherwise using llamafile + MS Phi3 128k as a local model.

rmusser01 commented 1 month ago

llamafile + https://huggingface.co/cognitivetech/samantha-mistral-instruct-7b_bulleted-notes_GGUF

Seems to be the best(slowest/easiest) method...

rmusser01 commented 1 month ago

Llamafile implementation is in.

will download 1 of 2 models, and then use llamafile to run them in system ram, if the '--local_llm' argument is passed. Checks if the files exist already before downloading, does sha-256 verificaitons of the downloaded files to ensure integrity/not-incomplete.

Need to test to ensure the API works as expected.

Need to add options to:

drop downs for local LLm inference
CLI API Args, so that a user may specify '-api local' and it will point to the local LLM instance

Need to add method of killing llamafile when script exits, so as not to keep it running.

rmusser01 commented 1 month ago

Tested on windows and confirmed working with 2 of 3 models. Phi3 for some reason just goes nutso when using it as part of the script. Will continue tweaking it, but the other two selected models work great...