Please improve error messages and getting started documentation

mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference

https://localai.io

MIT License

23.82k stars 1.82k forks source link

Please improve error messages and getting started documentation #1416

Open chris-hatton opened 10 months ago

chris-hatton commented 10 months ago

Is your feature request related to a problem? Please describe. Yes; I can't get your sample working. I'm using a cublas build with an Nvidia GPU; I've followed the setup carefully, and I don't see any log errors related to initialising the GPU; instead I always see failed to load model/EOF/bad magic. I've tried many models including luna-ai-llama2 from your sample, and have created the four files specified.

Describe the solution you'd like Please Improve the error messages surfaced by Local.AI, and your documentation. It looks like errors are heavily obfuscated at the moment due to the internal architecture of Local.AI You seen to be separating front and back end as two separate services, and the back-end does a poor job of surfacing error messages to the front end.

Describe alternatives you've considered I have no alternatives, except to give up & go home.

Additional context I'm not a complete dummy; I've had llama.cpp working on Metal and CPU before, but Local.AI's documentation and error messages leave a lot to be desired; I feel like I'm flying completely blind. Sorry to say your 'Getting started' documentation is not very well written and fails to establish vital facts for beginners such as:

What's the relationship between the naming of the model and the ID shown
Are the four files really vital? If so why doesn't Local.AI stop as soon as they're missing, or make a very clear error log about this
What's f16 mode? Do I have to enable it when working with a GPU?
What's the difference between using CUDA 11 or 12 build?

lunamidori5 commented 10 months ago

Oops, I did not mean to link it like that...

lunamidori5 commented 10 months ago

I'm not a complete dummy; I've had llama.cpp working on Metal and CPU before, but Local.AI's documentation and error messages leave a lot to be desired; I feel like I'm flying completely blind. Sorry to say your 'Getting started' documentation is not very well written and fails to establish vital facts for beginners such as:

What's the relationship between the naming of the model and the ID shown

Are the four files really vital? If so why doesn't Local.AI stop as soon as they're missing, or make a very clear error log about this

What's f16 mode? Do I have to enable it when working with a GPU?

What's the difference between using CUDA 11 or 12 build?

@chris-hatton Deeply sorry for this my friend! I have added a update to the How Tos (not the Getting Started) pages to better clear some of this up. If you could review my PR and let me know if there are changes ill happy make them

QNA:

What's the relationship between the naming of the model and the ID shown

Theres not, the name is just the name you send with your OpenAI request to Localai, so it can be what ever you want it to be

Are the four files really vital? If so why doesn't Local.AI stop as soon as they're missing, or make a very clear error log about this

No but the output of the model will be really jank and not well formatted without them, can you run the model raw with no configs but again, it will not do as well as if you setup the model using the 5 files needed.

What's f16 mode? Do I have to enable it when working with a GPU?

Yes for GPUs on other OS like Windows and Linux you will need to set this to true (It gives a speed up to the model when ran on GPUs)

What's the difference between using CUDA 11 or 12 build?

On other hardware some GPU only support CUDA 11 or CUDA 12, Where as with M based Macs they don't want to be using docker (see the getting started page).

I am not one of the devs but just someone that keeps the How To pages updated. So input is super welcomed!

localai-bot commented 10 months ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

I apologize for the confusion and frustration you are experiencing. We will work on improving our documentation and error messages to make them more clear and helpful. In the meantime, here is some additional information that may help:

The naming of the model file is not related to the ID shown in the logs. The id is a unique identifier for the model within the LocalAI system.
The four files (llama.cpp, llama.h, prompt_cache.py, and prompt_cache.db) are vital for the LocalAI system to function correctly. If they are missing or incorrect, the system may not be able to load the model or perform certain operations.
F16 mode is a feature of LLama that allows for better performance on some GPUs. It is enabled by default in the llama.cpp file and can be disabled if needed.
The CUDA build version does not affect the functionality of LocalAI, but it may impact the performance of the system. We recommend using the latest CUDA version available for your hardware to ensure the best performance.

Sources:

mudler commented 9 months ago

I think this wasn't meant to be closed - there is a lot to do in this area

mudler commented 9 months ago

I'm trying to re-work docs around the new features that should also ease out deployments. I'm currently crafting good examples as well but will take a bit to test them all.

lunamidori5 commented 9 months ago

I think I linked it wrong with the PR I did, sorry about that @mudler

apstrom commented 9 months ago

I would like to rejoin this call for additional documentation.

I would love to use LocalAI as a central repository for LLM operation on my network. I cannot, however, get models running or link OpenAI calls to LocalAI. The documentation in this regard does not provide enough detail for me to understand the concepts behind commands (which is what I need to then properly use commands in scenarios for which there isn't a tutorial).

More to this point--and I say this with respect, because I know how difficult writing these kinds of materials can be--the materials that do exist do not provide sufficient guidance for a non-technical user (like myself--I am a lawyer) to get the API running and to connected to the API.

An example of this issue: I have three models with associated YAML files in the model directory. These files are based on the how-to examples. I simply want to get the LocalAI running and connect it to Ollama-WebUI. This test should only require me to connect LocalAI to Ollama-WebUI via the OpenAI configuration in Ollama's UI (key and IP address). I cannot, however, make this connection or view the LLMs that should be in the model directory. I cannot begin troubleshooting this issue because I do not know if the model files are being identified as valid models by LocalAI; the documentation only allows me to check whether a model can be loaded, not whether LocalAI will list the models in the model directory when called to make such a list.

Some apps require an OpenAI key for security. The documentation does not mention this key. Can the key be omitted? Or does a key need to be provided? If a key needs to be provided, what is LocalAI expecting?

Another example: what if the application to connect to the API is not running on the same machine? Does the IP address then need to point to the LocalAI machine's instance on the LAN? If running both apps in Docker (thus bridging the apps to the localhost's IP address, but using internal IPs as bridges), does http://localhost:8080 allow the docker app making calls to LocalAI to connect?

lunamidori5 commented 8 months ago

@chris-hatton / @apstrom the how tos have been updated and would love your review on easy of install of new models. make note, this the new model installer it is really easy and self updating to the best known models of each size

Here is a updated link to the how tos - https://io.midori-ai.xyz/howtos/

apstrom commented 8 months ago

@lunamidori5 As promised.

The presentation of information is clearer.

What's missing from the documentation is a page that provides all of the possible YAML settings (i.e. a blank template YAML with every possible setting in the document, but commented out). Having this document as a reference will be useful for more advanced applications.

Similarly, a general description of the way in which LocalAI functions on the backend would be helpful. That description will allow users to more easily diagnose errors. In a similar vein, a page that describes the operational differences between embedding models and inference models will be useful to users that are new to AI. This description can also include suggested settings.

Embedding models need to be understood as parts of much larger AI operations. In my case, for example, I need to use embeddings to process massive amounts of legal texts. This use case differs from a chat use case or a single document query case. Do embedding models need any specific config. settings to allow this kind of use? A page that helps describe specific use cases and suggested settings will allow users to get their applications running much more quickly.

Finally, the matter of Huggingface models. I am growing to dislike Huggingface's pytorch models because they do not appear to run natively on LocalAI. Some sort of conversion is required to get the models running. If LocalAI can manage this conversion, great: a page that describes the process would be very helpful. If not, then a page that describes the kinds of model filetypes that LocalAI can run would be useful.

Note that I am not asking for a page like the model compatibility page in the old documentation. That page is helpful and should be updated / maintained. I am asking for a page that deals with specific file types or requirements from file types that LocalAI will run out-of-the-box (so-to-speak).

lunamidori5 commented 8 months ago

@apstrom Thank you!

Blank Yaml is already on the site with everything you can use for a GGUF based model. (more on that in a moment) Linked here - https://localai.io/advanced/

I do think that page needs some love but im 90% sure Mud is on that!

For huggingface models, GGUF models are fully supported. I am really not a fan of huggingfaces APIs and lacking of good docs (tell me how to save_pretrained() cpu please?) and thats why I want my docs to be really good! So ill get to work on updating them!

As for embedding model, the one used is great, its able to deal with over 10gbs in under 10s, so im not sure if it is the app your using to send the requests in, I know that anythingllm can be a bit picky on settings, so check the app your using. If it is still not working Ill open you a support chat and see if we can fix that. (Again Ill update the docs to be a bit more clear! Thank you!)

lunamidori5 commented 8 months ago

I as a docs volunteer I only know so much about the code, but ill look into as best as I can!