Prompt model:-None Loaded error Older gaming PC

Willxiam commented 5 months ago

When testing this out (thank you for making it available) on an older gaming pc Windows 10 PC 16 gb ram i5 processor NVIDIA GeForce® GTX 1070 GAMING PCI Express video card

I receive "Prompt-None Loaded error" when I submit a query.

Other Details: Release v0.4.0 manually download recommended models tinyllama-1.1b following links. Copied into model directory. Selected generic prompt template for model. Downloading via Model manager did not seem to work.

Have tried reloading model and the force reload tool in main menu window.

nathanlesage commented 5 months ago

So in 0.4.0 there was a bug where with new conversations you had to manually force reload the model once; in 0.5.0 this should happen automatically. So I'd recommend updating and then (just for good measure) force reloading the model (in the statusbar the reload button).

Does this solve the issue?

EDIT: Because I saw you explicitly added "older gaming PC" into the issue title: I just double checked and the RTX 1070 supports CUDA and apparently it's backwards compatible, so it's likely not an issue with your GPU.

Willxiam commented 5 months ago

Updated to Release v0.7.1 and decided to again try and use the build in downloader, I realized I was using the wrong download link. See image below. It continued not to work after updating to the latest version, force reloading model in both the manage model window and in the main window. Along the bottom of application it says "provider not initialized (unknown). There is a button to force reload. WhenI click it it says "Could not reload model: None loaded"

Wrong download link to copy.

Willxiam commented 5 months ago

Update: After rebooting it will now initialize the provider. Showing " Model ready" along the bottom.

EDIT: It takes quite some time to generate anything. In both instances I have used this, I have thought it would be useful to have a stop generating.

nathanlesage commented 5 months ago

@Willxiam Exactly, I don't know why HuggingFace forces you to click on that download link to actually download the file, but that's the way it is …

Regarding stop generation button: While the model is generating, the "Force reload" button will turn into a "stop generating" button. If you click it, wait until the next token has been generated and the model will automatically stop generating.

Willxiam commented 5 months ago

I have used the force reload and sometimes it the program will hang. I have also found that sometimes trying to enter a new query will generate an error that will help it quite out of the former. I have experienced this on both the systems I have installed on, one being the older one which is harder to gauge what is due to the application and what is due to the system.

nathanlesage commented 5 months ago

Mhmh, could be — I just today released a new version that also allows for CUDA support. I haven't yet enabled that flag, but the 0.8.0 should run exclusively on the CPU. This is not making most use of the system, BUT it should at least get you up and running. Still, the Node bindings are still in beta, so there will probably be more improvements over time.

Willxiam commented 5 months ago

thanks. I will give the latest release a go. Cannot get to it until after work though. After I got localchat to recognize the model (not sure why it was not before) it would generate, but it would take quite some time. If I remember correctly, about 15 to 30 minutes for a quick prompt. And sometimes it would seem to get stuck in the attempt to generate. I am assuming that running just on the CPU might be slowe3r unless there is some issue interfacing with my GPU> It has occurred to me that maybe my gpu has an updates set of drivers. Or I could put a newer GPU in this system to get it working better.

Willxiam commented 5 months ago

Ok so I installed latest release and noticed that the reading of the model's meta data was much faster. And it also seems like generation is faster and more stable. The force reload was nearly instantaneous.

I also did some tests 7 minutes to generate answer to : What is the capital of France.? Details: generated “the capital of france is Paris” at about the 450s.1s mark. “the capital of france is Paris” Stopped generating at 534.9s

12 minutes for What is 1+1? generated 1+1= at about 330.5s got the correct answer 2 at around 450.5s then finished generating at 765.

So this is an improvement. I pulled these test prompt form some forum someplace, but maybe there are other better ones.

nathanlesage commented 5 months ago

Regarding the slow speed: I haven't used an i5 in quite some time, so I don't know if these numbers are odd or to be expected. But it's definitely not decent. I haven't gotten a great idea of how to implement configuration, but once I do I'll enable the possibility to switch to the CUDA version of llama.cpp, which should increase inference speed drastically.

Willxiam commented 5 months ago

This is the reason I put older machine in the thread. I do not know either. But I will keep leaving feedback because I suspect many others will be trying to figure out the same. Thanks for all of your help. I will try to test out the models on the newer machine I have tomorrow.

nathanlesage / local-chat

Prompt model:-None Loaded error Older gaming PC #1