nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
https://nomic.ai/gpt4all
MIT License
70.25k stars 7.68k forks source link

chat: Could not load model due to invalid format (v2.4.14) #1288

Closed kienerj closed 1 year ago

kienerj commented 1 year ago

System Info

v2.4.14 Windows 10, 32 GB RAM, 6-cores Using GUI and models downloaded with GUI

It worked yesterday, today I was asked to upgrade, so I did and not can't load any models, even after removing them and re downloading.

"Could not load model due to invalid format"

Information

Related Components

Reproduction

Open the desktop application and try to load any model. All of them give the "Could not load model due to invalid format" error.

Expected behavior

That models load

AndriyMulyar commented 1 year ago

Can you detail which model you were trying to load here.

kienerj commented 1 year ago

All of these:

image

rvnlord commented 1 year ago

I came here to report the same problem on windows, only with the latest version of the GUI (v2.4.14). If you downgrade the version it works. So models don't get corrupted (even if you redownload them using v2.4.14, the new ones will throw the same error but when you downgrade the GUI the models downloaded through v2.4.14 will work in earlier version).

cosmic-snow commented 1 year ago

We had someone report this case on the Discord, too, but the cause is not clear right now. Can you add information about your CPU, please?

Also, if you're adventorous: Maybe try the Python bindings and report back if those work, at least?

kienerj commented 1 year ago

Don't have access to the laptop right now but it's a 6-core "coffeelake" based laptop cpu, probably i7-8850H.

ghost commented 1 year ago

Same issue after I updated to the latest version GUI 2.4.14. Using wizardLM-13B-Uncensored.ggmlv3.q4_0.bin or GPT4All-13B-snoozy.ggmlv3.q4_0.bin I am on a Ryzen 7 4700U with 32GB of RAM running Windows 10 log.txt The "download failed" error is because of my firewall. It makes no difference if I allow it. All models still fail to load

This is what I get with python using the same file. I moved it pip show gpt4all Name: gpt4all Version: 1.0.8

from gpt4all import GPT4All model = GPT4All("wizardLM-13B-Uncensored.ggmlv3.q4_0.bin") Found model file at C:\\Users\\XXX\\.cache\\gpt4all\wizardLM-13B-Uncensored.ggmlv3.q4_0.bin llama.cpp: loading model from C:\\Users\\XXX\\.cache\\gpt4all\wizardLM-13B-Uncensored.ggmlv3.q4_0.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32001 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 0.09 MB llama_model_load_internal: mem required = 9031.71 MB (+ 1608.00 MB per state) llama_new_context_with_model: kv self size = 1600.00 MB output = model.generate("How are you today?", max_tokens=10) print(output) I hope you're doing well.

cosmic-snow commented 1 year ago

Thanks for the additional information. I guess it's still kind of elusive.

However, we just encountered a case where a model couldn't load because of a special character in the folder path. Make sure that isn't the case with where you stored your models for the chat GUI, please.

goog commented 1 year ago

the same, i cant load replit and falcon

kienerj commented 1 year ago

However, we just encountered a case where a model couldn't load because of a special character in the folder path. Make sure that isn't the case with where you stored your models for the chat GUI, please.

It's in the default folder in my case for windows. AppData/Local/nomic.ai/GPT4ALL. Same as it was in v2.4.13 and there no issue.

I am however on a "locked corporate laptop", not very tightly locked but it does run an older version of Windows 10, namely 1909. So if some dependency got updated that requires a newer win 10 version, that could also be the cause? (I have encounter such issues with other apps/libraries so it's indeed possible)

diabloyg commented 1 year ago

Interestingly I encountered the identical issue, with 16 GB rom and Ryzen 5 5600U CPU.

However, after switching to another laptop with 16 GB rom and i7-10875H CPU, the issue just disappeared.

goog commented 1 year ago

b20230803150833

what 2.4.14 update? was The developer acquired?mess update

cosmic-snow commented 1 year ago

Interestingly I encountered the identical issue, with 16 GB rom and Ryzen 5 5600U CPU.

However, after switching to another laptop with 16 GB rom and i7-10875H CPU, the issue just disappeared.

There have been a few reports like this, but it doesn't look like there really is a pattern so far (regarding hardware). It's quite strange. There have also been a few reports that turned out to be a problem which could be solved with a bit of troubleshooting. Things you can check:

If you're feeling adventurous:

P.S. If nothing helps or you need assistance with one of the above, maybe join the Discord for easier troubleshooting.

kienerj commented 1 year ago

Because it seems to happen to very few people and is quite an elusive issue.

What if as I mentioned it's related to the Windows 10 version? I have seen problems with other software that need a newer Win 10 version that what i have on the corporate laptop. Eg some MSVC dependency or even windows itself.

cosmic-snow commented 1 year ago

What if as I mentioned it's related to the Windows 10 version? I have seen problems with other software that need a newer Win 10 version that what i have on the corporate laptop. Eg some MSVC dependency or even windows itself.

Good point, that's definitely also a possibility. I have not compiled the release version myself, will have to check back if anything changed there.

diabloyg commented 1 year ago

Interestingly I encountered the identical issue, with 16 GB rom and Ryzen 5 5600U CPU. However, after switching to another laptop with 16 GB rom and i7-10875H CPU, the issue just disappeared.

There have been a few reports like this, but strangely enough it doesn't look like there really is a pattern so far (regarding hardware). It's quite strange. There have also been a few reports that turned out to be a problem that could be solved with a bit of troubleshooting. Things you can check:

  • As mentioned previously, there must be no strange Unicode characters in the model folder path.
  • Are all your models readable? Check permissions.
  • Are you using some special Antivirus or other system protection software that might block access?
  • Check system logs for special entries. Win+R then type: eventvwr.msc. 'Windows Logs' > Application
  • Back up your .ini file in <user-folder>\AppData\Roaming\nomic.ai and let it create a fresh one with a restart. If you had a different model folder, adjust that but leave other settings at their default. See if that changes anything.

If you're feeling adventurous:

  • Check if any of the language bindings work. If you can run Python bindings, there's also a CLI, at least.
  • If you can get language bindings to work and know how to compile yourself, try with the MSVC compiler. See if you can still run it with that.
  • Trace the program with a tool such as Sysinternals' Process Monitor. It's freely available, but doing that is not quite so simple.
  • Ideally, someone would compile the chat GUI themselves using a system on which it doesn't work and investigate where exactly it fails. Because it seems to happen to very few people and is quite an elusive issue.

P.S. If nothing helps or you need assistance with one of the above, maybe join the Discord for easier troubleshooting.

It is considered that such failure was probably caused by GUI issue. Under the same office environment, the GUI worked in laptop with i5 CPU. Besides, the model is available using python script approach with my own AMD CPU laptop. I tried to run the GUI with python code, but still failed.

cosmic-snow commented 1 year ago

It is considered that such failure was probably caused by GUI issue. Under the same office environment, the GUI worked in laptop with i5 CPU.

So far, it doesn't look like it's caused a specific CPU. There have been reports of old and new CPUs, both AMD and Intel.

I tried to run the GUI with python code, but still failed.

I don't understand this sentence.

kienerj commented 1 year ago

What if as I mentioned it's related to the Windows 10 version? I have seen problems with other software that need a newer Win 10 version that what i have on the corporate laptop. Eg some MSVC dependency or even windows itself.

Good point, that's definitely also a possibility. I have not compiled the release version myself, will have to check back if anything changed there.

Important is to know what actually changed between v13 and v14. Because whatever the change was must be causing this. New compiler version? new machine compiler is run on (upgraded from Win 10 to win11 maybe?) etc.

cosmic-snow commented 1 year ago

Important is to know what actually changed between v13 and v14. Because whatever the change was must be causing this. New compiler version? new machine compiler is run on (upgraded from Win 10 to win11 maybe?) etc.

I've asked that. Will report back here once I know.

However, there have been some changes in the logic, too. But so far it's not clear where exactly the problem is. Someone equipped with a debugger would be able to tell.

Edit: By the way, there is a vc_redist installer being delivered with the official installer (these are official Microsoft Windows runtime dependencies which should already be on your system). It's in the bin subdirectory. Installing that is another thing you could try.

cosmic-snow commented 1 year ago

Important is to know what actually changed between v13 and v14.

Turns out, nothing significant should have changed between these versions (.13 -> .14). Qt got upgraded to 6.5.1 from 6.5.0 but it looks like that happened in an earlier release.

So I guess that leaves the code changes.

goog commented 1 year ago

i find falcon model md5 same with 18 july,
today i download falcon success, but load fail. and it is client issue. it blocked AMD CPU on win10?

kienerj commented 1 year ago

If you downgrade the version it works.

How can i downgrade? I only see a generic installer on the page and no link to an older version?

kienerj commented 1 year ago

I know checked on a different machine, windows 2019 server. Same error. But it runs the same antivirus which could be the culprit? (Crowdstrike)

goog commented 1 year ago

@kienerj so maybe it is the moat of ms working hide?

cosmic-snow commented 1 year ago

How can i downgrade? I only see a generic installer on the page and no link to an older version?

Before you attempt that, have you tried any of the other suggestions in my previous comment? https://github.com/nomic-ai/gpt4all/issues/1288#issuecomment-1663482924

As far as I know, there isn't a way to downgrade through the installer, but you could hack something together with older DLLs.

Important: If you do that, you're on your own, though. Because it's untested and there is no way to know what other problems that could cause. Also note that older versions possibly don't support some model architectures.

Additionally, while it looks like the older DLLs are still on the server, there is no guarantee they'll remain there:

OmenLW commented 1 year ago

Having this same issue, Windows server 2019. Fresh install, couldnt load any models that I tried. Downgraded with the bin and lib from above and its working. It's definitely code related.

cosmic-snow commented 1 year ago

Having this same issue, Windows server 2019. Fresh install, couldnt load any models that I tried. Downgraded with the bin and lib from above and its working. It's definitely code related.

Well, the problem is that it seems to only affect a select few people, but so far none of them have tried to investigate it more closely on their own systems.

So it's not clear where exactly it fails and why. (Several people have looked at the code.)

What's the exact build version of your Windows Server 2019? Recently, I've seen two reports with outdated Windows versions, although it's not clear whether these were related to this issue. (That's in addition to what @kienerj mentioned previously.)

cosmic-snow commented 1 year ago

Alright, let's try something. Can you run the following Python script on a machine where 2.4.14 doesn't work and report back with the results?

https://gist.github.com/cosmic-snow/6c93526ea4c8b5428def2bf63d6be390

diabloyg commented 1 year ago

Hi, when I tried to ask a question using python code in the conda environment (with python 3.8), everything goes fine: Python 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:01:59) [MSC v.1929 64 bit (AMD64)] Type "copyright", "credits" or "license" for more information.

IPython -- An enhanced Interactive Python.

import gpt4all.gpt4all from gpt4all import GPT4All

gpt4all.gpt4all.DEFAULT_MODEL_DIRECTORY = 'C:\gpt4all'

model = GPT4All(model_name='ggml-model-gpt4all-falcon-q4_0.bin', allow_download=False)

output = model.generate("How many calories are in a banana ",max_tokens=100) print(output)

Bananas are a good source of carbohydrates and fiber, with only 105 calories per medium-sized banana.

Which means the model itself worked. After running the code you've provided, it gives:

AVX: False AVX2: False AVX512: False RAM: 16384.0 MiB

I am not sure how could this happen (since ZEN3 CPU should have such support), but it seemed that the CPU instruction set may matter?

goog commented 1 year ago

my return with Python 3.11.4 (tags/v3.11.4:d2340ef, Jun 7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32: cpu is Intel Core i5-8250U @ 1.60GHz gpu AMD

C:\Users\HF>python inspect.py
AVX:    False
AVX2:   False
AVX512: False
RAM:    20480.0 MiB
goog commented 1 year ago

cpu20230807160813

maybe python return false is the cause?

import cpuinfo

def is_avx_supported():
    info = cpuinfo.get_cpu_info()

    # 'flags' contains a list of the CPU's capabilities
    if 'flags' in info:
        if 'avx' in info['flags']:
            return True

    return False

print(is_avx_supported())

it shows AVX support

cosmic-snow commented 1 year ago

Very well guys, thank you.

I think we found the culprit. It's the Windows API.

I am not sure how could this happen (since ZEN3 CPU should have such support), but it seemed that the CPU instruction set may matter?

The difference here is that the Python bindings use MinGW and not MSVC, the code is not exactly the same.

maybe python return false is the cause? ... it shows AVX support

Well, it's not Python's fault, but the Windows API is not showing the correct support.

This was really quite an unfortunate way this problem got introduced. For now, either just use the old DLLs or upgrade your Windows to a more recent version.

Edit: I've also had definitive confirmation today in Discord that updating the system to a current version resolves the issue.

Edit 2: By the way, once you're done with the test script, delete the file again or rename it. You don't want it to get in the way of the inspect system library. Just happened to me. 😅

OmenLW commented 1 year ago

Just a heads up, even with a fully up-to-date Windows Server 2019 (1809), I still get the error.

cosmic-snow commented 1 year ago

Just a heads up, even with a fully up-to-date Windows Server 2019 (1809), I still get the error.

I guess that's to be expected, because although it seems to be still supported, they didn't lift it to a more recent base version.

Should be fixed in the next version, or if you want to test it, build from main.

kienerj commented 1 year ago

To add for anyone interested:

Support for PF_SSSE3_INSTRUCTIONS_AVAILABLE through PF_AVX512F_INSTRUCTIONS_AVAILABLE were added in the Windows SDK (19041) and are supported by Windows 10, Version 2004 (May 2020 Update) or later.

eg one needs at least Windows 10 2004.

amichelis commented 1 year ago

Are you saying that it will not work under Windows Server 2019?

kienerj commented 1 year ago

Are you saying that it will not work under Windows Server 2019?

I tested it on an instance and indeed it did not work.

Albeit see above merge request. I think next version should work again due to previous way of detecting CPU features.

cosmic-snow commented 1 year ago

Albeit see above merge request. I think next version should work again due to previous way of detecting CPU features.

Well, previously it worked because it was buggy -- which made it not work for AVX, only AVX2. But the mentioned low-level code which is being reinstated is what should've worked. So it's not tested on actual hardware, but let's hope it will. We did some tests with an emulator, though. So that's looking good.

Switching to the high-level Windows API was one attempt to resolve the problem for people with only AVX support on their CPUs. It turned out that wasn't the culprit for that other bug. And fixing that other bug revealed this problem.

bleedchocolate commented 1 year ago

I just installed on Windows Server 2016 VM and gave it 32GB of RAM, and I'm getting the same error: gpterror1

System info: gpterror2

cosmic-snow commented 1 year ago

This should be fixed in v2.4.15 and later releases. Please try again.

Although note that this release also includes Vulkan GPU support for Llama models. If you encounter a model loading error, make sure to go to settings and explicitly select 'CPU' as device, then try again.

kienerj commented 1 year ago

Can confirm it works on Windows Server 2019

bleedchocolate commented 1 year ago

Confirming it now works on Windows Server 2016.

Thanks so much! Rick

On Thu, Sep 14, 2023 at 3:48 AM cosmic-snow @.***> wrote:

This should be fixed in v2.4.15. Please try again.

Although note that this release also includes Vulkan GPU support for Llama models. If you encounter a model loading error, make sure to go to settings and explicitly select 'CPU' as device, then try again.

— Reply to this email directly, view it on GitHub https://github.com/nomic-ai/gpt4all/issues/1288#issuecomment-1719027059, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVIFZZ7OQFPL72SWA4SLLRTX2LAEJANCNFSM6AAAAAA23IBLEI . You are receiving this because you commented.Message ID: @.***>

duncreg commented 9 months ago

Umm... is there any reason I should be experiencing this now under Linux?

cebtenzzre commented 9 months ago

Umm... is there any reason I should be experiencing this now under Linux?

Please open a new issue. Current versions of GPT4All should give a more specific error in the console.

GevarraChe commented 8 months ago

Version 2.6.2, always error! Here my PC $($processorInfo.Caption)" Процессор: Intel64 Family 6 Model 140 Stepping 1 $($memoryInfo.Sum / 1GB) GB" Объем оперативной памяти: 16 GB $($videoCardInfo.Caption)" Видеокарта: Intel(R) Iris(R) Xe Graphics

cebtenzzre commented 8 months ago

Version 2.6.2, always error!

Please open a new issue, and mention the name of the model you are trying to load (and the URL if you downloaded it from the internet).