ValueError: matrix contains invalid numeric entries

saftle commented 1 year ago

I believe this was introduced with the last main branch commit, since I had no issues on the GPU branch, unless it is something to do with this merge type. Here is my command and the error output:

Executing command: merge_models.py -a "E:\A1111 Web UI Autoinstaller\stable-diffusion-webui\models\Stable-diffusion\URPMv1.4.1.safetensors" -b "E:\A1111 Web UI Autoinstaller\stable-diffusion-webui\models\Stable-diffusion\newtest\cunnilingus5-5_sd-v1-5_pruned-fp16.safetensors" -m euclidean_add_difference -o "E:\A1111 Web UI Autoinstaller\stable-diffusion-webui\models\Stable-diffusion\output\A0.25-RB050-WC" -f safetensors -ba 0.25 -c "E:\A1111 Web UI Autoinstaller\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned.ckpt" -p 16 -wc -rb -rbi 50 --device cuda --prune before loading models: 0.000 loading: E:\A1111 Web UI Autoinstaller\stable-diffusion-webui\models\Stable-diffusion\MODELA.safetensors loading: E:\A1111 Web UI Autoinstaller\stable-diffusion-webui\models\Stable-diffusion\newtest\MODELB.safetensors loading: E:\A1111 Web UI Autoinstaller\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned.ckpt models loaded: 6.483 permuting 0 iteration start: 8.641 weights & bases, before simple merge: 8.641 stage 1: 100%|███████████████████████████████████████████████████████████████████| 1131/1131 [00:00<00:00, 3272.85it/s] after stage 1: 8.644 stage 2: 100%|████████████████████████████████████████████████████████████████| 1131/1131 [00:00<00:00, 1130812.35it/s] after stage 2: 8.644 simple merge done: 8.641 weight matching #1 done: 8.645 apply perm 1 done: 8.643 weight matching #2 done: 8.648 model a updated: 8.660 1 iteration start: 8.660 weights & bases, before simple merge: 8.660 stage 1: 100%|███████████████████████████████████████████████████████████████████| 1131/1131 [00:00<00:00, 5228.83it/s] after stage 1: 8.665 stage 2: 100%|█████████████████████████████████████████████████████████████████████████████| 1131/1131 [00:00<?, ?it/s] after stage 2: 8.665 simple merge done: 8.661 Traceback (most recent call last): File "E:\meh\merge_models.py", line 109, in <module> main() File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1130, in __call__ return self.main(*args, **kwargs) File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1055, in main rv = self.invoke(ctx) File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 760, in invoke return __callback(*args, **kwargs) File "E:\meh\merge_models.py", line 92, in main merged = merge_models( File "E:\meh\sd_meh\merge.py", line 141, in merge_models merged = rebasin_merge( File "E:\meh\sd_meh\merge.py", line 263, in rebasin_merge perm_1, y = weight_matching( File "E:\meh\sd_meh\rebasin.py", line 2299, in weight_matching linear_sum, number, perm, progress = inner_matching( File "E:\meh\sd_meh\rebasin.py", line 2240, in inner_matching ri, ci = linear_sum_assignment(A.detach().numpy(), maximize=True) ValueError: matrix contains invalid numeric entries

ljleb commented 1 year ago

There seems to be NaNs in merged models using euclidean add difference. Will look into it.

ljleb commented 1 year ago

It seems to be a precision bug in the calculation for the normalization.

ljleb commented 1 year ago

I think this is resolved now? Closing, but please reopen if I am mistaken.

s1dlx / meh

ValueError: matrix contains invalid numeric entries #22