Open Sumsar3 opened 8 months ago
The stage1 is done internally automatically, it's the vae encode/decode with their special vae. I haven't had any issues with results, you tried with the llava results as prompt on Comfy? The prompt affects it a lot. With the non-tiled version all the llava does is the prompt, with the tiled version it's more important as it gets prompt per tile, I have not implemented that yet though I did implement the tiled sampling, which allows you to do way higher resolutions with same memory use.
Besides skipping llava, the inference code is mostly identical, so results with exact same settings, including prompt, should be very similar.
I'm not using Llava. I am using WD14 Tagger + default prompts. Is Llava compatible with this version? I'm pretty newbie with this things. In any case, I have tried quite a few prompts and I have not obtained very different results.
Here is my workflow:
I mean on the replicate there's option to use llava, are you comparing results to that?
Personally I just used very simple prompts, for example the "old man" image was just "high quality, photograph, detailed, old man", no tagger, no llava, just that and it worked great at 3k res.
In the end all the LLM stuff does is generate a string to use as prompt, there's no other magic to it.
SDXL base also isn't the best to use in my opinion, I got great results using fine tunes such as Proteus 0.4 for example.
Yes, on replicate I use llava (default is ON so was untouched) In any case, LLaVA should not be so decisive to change people's faces but it should be so decisive for other details of the photo. I will continue testing with the prompts and see if I see improvements
Problem still persist. Here some examples:
Original is a very low resolution photo. Replicate is perfect and very realistic upscale. Supir-ComfyUI fails a lot and is not realistic at all.
This is a Supir ComfyUI upscale:
(oversharpness, more details than the photo needs, too differents elements respect the original photo, strong AI looks photo)
Here's the replicate one:
(more realistic, almost perfect upscaled photo)
All photos are using same settings: 50 EDM steps, same model (F), 7,5 cfg, 1 control scale, LLaVA captioner (even on ComfyUI)...
Current workflow:
Please HELP
Did you also use the F model on replicate? And exact same settings otherwise too?
Did you also use the F model on replicate? And exact same settings otherwise too?
All exact settings. Same model and settings (I don't know what SDXL model uses on replicate) Rest of settings are the same.
I don't have your input image, so just screenshotted from your screenshot:
I don't have your input image, so just screenshotted from your screenshot:
Lowering CFG helps with oversharpening and AI looks, but still worse than replicate one (replicate uses 7.5 cfg)
Here is the original photo:
Here is my upscaled image with your settings:
If you compare it with the replicate one you can see that replicate is most realistic and accurate to the real one. On ComfyUI you can see reinvented things (wiper blades or door handle are way different to real photo) On the real photo the car has a protective white paper on the hood that disappear on ComfyUI photo but you can see on replicate one The wheels are covered by plastic that you can see on replicate upscale, but not on ComfyUI. Walls have like a wood texture on ComfyUI and replicate has concrete walls like real photo...
I don't understand. Same project, same models, same settings, same photo but totally different results. Replicate is 90% accurate, ComfyUI is almost full reimagined photo.
https://github.com/kijai/ComfyUI-SUPIR/assets/40791699/da451c18-cd1d-4b7e-8204-b7d88f00f413
With pretty much same settings, SDXL base model. I mean sure there are differences, but I really wouldn't go as far as to say it's "much worse". It's always going to be bit different, also I don't use llava at all for now. Lowering cfg from this would make it hallucinate less so it can be dealt with.
Let's go with example image:
Replicate upscale:
Not as good as example on web but pretty fine. The car has some visual glitches but still looking good. The landscape is perfect. Very realistic. Like a native high res photo.
Let's go with ComfyUI with exact same settings (cfg 7.5):
Car is full of visual glitches. Landscape looks oversharpened. Is blurry below the photo
Let's go with ComfyUI again but this time lowering cfg to 5.0:
Little better here but car still have visual glitches and landscape still looks weird. The photo has less detail than replicate one.
Something isn't working fine here. Results are way differents between both "supir"
EDIT: All images are x4 upscaled. I did a x2.5 upscaled with cfg at 6.0 on ComfyUI and looks PERFECT like example image:
I don't understand nothing
Edit 2: Faces don't have solution. Much better and accurates on replicate than here.
Ok I think I found one thing that affects it quite a bit. In the recent update for the main repo (which I implemented as it was mostly) they changed how the cfg works, seemingly to how it was supposed to work from the beginning, maybe, I don't know. While replicate uses older version. They added this (now commented out) line:
The new defaults in the main repo:
Which means the linear cfg scale function now sets the min correctly to max, which is the chosen cfg, when cfg linear scaling is off (in the node it's off when the value is 0.0).
So to match old behavior (which is what replicate also still uses) where cfg was set to 7.5 and linear scale was off, you'd now set the cfg_scale_start to 7.5 and cfg to 4.0.
Honestly I'm too confused by all the different names they are calling this cfg scaling to say what's the proper, intended method to use it. And I probably need to rename mine once I figure out the logic better. But this explains the extreme burnout in the results with 7.5 cfg.
Actually obviously having the start larger than the cfg does nothing. If you don't want linear scaling just set both to same value.
Setting cfg_scale_start to 4 and cfg to 4.0 improves A LOT fidelity, but faces keep failing. I keep testing it.
Not to pile on, here, but I just tried the Replicate.com website to upscale a photo I have been working on in ComfyUI, and with just the default settings, the Replicate upscale is much better than the upscale I've been getting out of Comfy - especially faces. It is just SO much better. We're definitely missing something with this version. I'm using the new Claude AI to give me a good description, so I know LLAMA isn't the difference. Do we have any idea what SDXL model they're using?
Not to pile on, here, but I just tried the Replicate.com website to upscale a photo I have been working on in ComfyUI, and with just the default settings, the Replicate upscale is much better than the upscale I've been getting out of Comfy - especially faces. It is just SO much better. We're definitely missing something with this version. I'm using the new Claude AI to give me a good description, so I know LLAMA isn't the difference. Do we have any idea what SDXL model they're using?
What cfg values were you using? The replicate version uses the initial version of the code where cfg scale_min is always 4.0 (the default from the config) if use_linear_CFG is disabled. As far as I understood, to match you would set cfg_scale to 4.0 and cfg_scale_start to 7.5 in the node.
I would rename and move the widgets to make more sense, but that's always a hassle in comfy as it breaks old workflows using the node, requiring it to be re-created.
I've been using another supir project (the autoinstaller of patreon) and there's no color.
Original ultra low quality pic:
ComfyUI:
Supir Gradio Patreon:
Same settings. Juggernaut XL v9 model, 6 cfg. LLaVA prompts on both.
Original high res photo:
@Sumsar3 Yeah. Just look at that sign difference.
The settings being what exactly?
When starting from something that low quality even seed affects it so much, hard to compare.
The settings being what exactly?
When starting from something that low quality even seed affects it so much, hard to compare.
This is awesome. Maybe that pre-upscale image does magic?? This result is far away from mine. I'm gonna try to copy your workflow.
The settings being what exactly? When starting from something that low quality even seed affects it so much, hard to compare.
This is awesome. Maybe that pre-upscale image does magic?? This result is far away from mine. I'm gonna try to copy your workflow.
It could actually very well be, one difference in the other implementations is that they use 1024 minimum scale as default, so one side of the image is always 1024. Another difference in the main repo is that they apply some sort of gamma correction to the image, that's something to test too.
I also finally figured out how to load the SDXL clip from the checkpoint directly, it didn't seem to have huge impact on quality but there's a slight difference, I have pushed the update, it makes the model loading way faster and we no longer need the CLIP models separately at all.
@kijai Hey man, I know we're here complaining, but I just want to thank you for your great work. If you didn't do work like this, people like me would have nothing to complain about.
Have a great weekend.
I'm still doing tests. I'm trying to replicate images in both projects with same settings and seeds and results are very different
Here ComfyUI upscale:
Workflow with settings (all modules out of screen are bypassed, llava also disabled):
Here Supir upscale working on gradio web (I'm using patreon installer based on original project with some optional adds):
Here all settings:
event_id: 1710008589844585500 localtime: Sat Mar 9 19:23:09 2024 prompt: base_model: models/Juggernaut-XL_v9_RunDiffusionPhoto_v2.safetensors a_prompt: Cinematic, High Contrast, highly detailed, taken using a Canon EOS R camera, hyper detailed photo - realistic maximum detail, 32k, Color Grading, ultra HD, extreme meticulous detailing, skin pore detailing, hyper sharpness, perfect without deformations. n_prompt: painting, oil painting, illustration, drawing, art, sketch, oil painting, cartoon, CG Style, 3D render, unreal engine, blurring, dirty, messy, worst quality, low quality, frames, watermark, signature, jpeg artifacts, deformed, lowres, over-smooth num_samples: 1 upscale: 4 edm_steps: 50 s_stage1: -1 s_stage2: 1 s_cfg: 6 seed: 123456789 s_churn: 5 s_noise: 1.003 color_fix_type: Wavelet diff_dtype: fp16 ae_dtype: bf16 gamma_correction: 1 linear_CFG: True linear_s_stage2: False spt_linear_CFG: 4 spt_linear_s_stage2: 0 model_select: v0-F apply_stage_1: False face_resolution: 1024 apply_bg: False face_prompt:
Both upscaling are x4 but both has different resolution: comfyui 1440x960 and gradio 1536x1024. ComfyUI images are always a little wider than gradio ones.
Hope this can help to still improve the project.
Try this upscaler workflow. It's not SUPIR, but in my experience, it does even better. Sadly, it's very slow. But if you really want the best upscale, this one is it.
with 50steps and cfg3 i works fine , not enough steps makes it blocky
I tested the workflow with these settings and the workflow I linked was still markedly better with low resolution photos that you want to maintain fidelity... at least with the two old photos that my father wanted to see upscaled.
Both upscaling are x4 but both has different resolution: comfyui 1440x960 and gradio 1536x1024. ComfyUI images are always a little wider than gradio ones.
Hope this can help to still improve the project.
Thanks for the testing and the values. Is the code for this interface available somewhere? I don't really understand why the results with same settings can be so different. That's not to say I can't get satisfying results with the node though, I can't really find anything that would cause such big of a difference, unless they have something different in the code.
I added the support for different sampler now from the main repo, which seems to work better with lightning models, example with 15 steps only, not the best but pretty good imo:
Both upscaling are x4 but both has different resolution: comfyui 1440x960 and gradio 1536x1024. ComfyUI images are always a little wider than gradio ones. Hope this can help to still improve the project.
Thanks for the testing and the values. Is the code for this interface available somewhere? I don't really understand why the results with same settings can be so different. That's not to say I can't get satisfying results with the node though, I can't really find anything that would cause such big of a difference, unless they have something different in the code.
I added the support for different sampler now from the main repo, which seems to work better with lightning models, example with 15 steps only, not the best but pretty good imo:
Here is his public repo: https://github.com/FurkanGozukara/SUPIR
Yea that other one is paywalled but I've subbed to that guy for running a lot of comparison trials for training.
I dont' feel comfortable sharing his patreon script, but the config files are universal: I used Beyond compare to compare them then export. He also disables stage 1 by default.
Thanks so much for your work on this Comfy implementation - I hope this is insightful!
SUPIR_v0.yaml comparsupir.pdf
SUPIR_v0_tiled.yaml SupirConfigCompare-tiled.pdf
Util.py has some changes numpy/pillow compare-util.pdf
So looks like the only difference in the config files is that his uses "softmax-xformers" & "Vanilla-xformers" to your "softmax", and "Vanilla"
I've never heard of xformers -improving- quality (generally its performance/memory usage in my experience), but I'm not familiar with "softmax". I'm going see if I can run the node w/ those settings changed in the node's config.
Some dependency version differences:
So looks like the only difference in the config files is that his uses "softmax-xformers" & "Vanilla-xformers" to your "softmax", and "Vanilla"
I've never heard of xformers -improving- quality (generally its performance/memory usage in my experience), but I'm not familiar with "softmax". I'm going see if I can run the node w/ those settings changed in the node's config.
It's just my default, I have it detecting the presence of xformers and changing the config dynamically. Also have tested and there's no quality difference, there of course is a slight result difference between them like usual.
So looks like the only difference in the config files is that his uses "softmax-xformers" & "Vanilla-xformers" to your "softmax", and "Vanilla" I've never heard of xformers -improving- quality (generally its performance/memory usage in my experience), but I'm not familiar with "softmax". I'm going see if I can run the node w/ those settings changed in the node's config.
It's just my default, I have it detecting the presence of xformers and changing the config dynamically. Also have tested and there's no quality difference, there of course is a slight result difference between them like usual.
How about the dependency versions? Could those be causing the difference?
(to be fair I haven't done my own comparison w/ his / I did make a workflow comparing StableSR and SUPIR and a was a bit surprised that SUPIR seemed a little inferior in many cases)
Yea that other one is paywalled but I've subbed to that guy for running a lot of comparison trials for training.
I dont' feel comfortable sharing his patreon script, but the config files are universal: I used Beyond compare to compare them then export. He also disables stage 1 by default.
Thanks so much for your work on this Comfy implementation - I hope this is insightful!
SUPIR_v0.yaml comparsupir.pdf
SUPIR_v0_tiled.yaml SupirConfigCompare-tiled.pdf
Edit:
Util.py has some changes numpy/pillow compare-util.pdf
If the code is the github repo linked before, they have not done anything regarding the sampling really, just UI stuff.
The original sampling pipeline, which that UI also uses, is extremely inefficient. There's no need for SUPIR to use as much resources as it does, it's just not optimized at all, like we are used to with comfy. I've applied some of the memory optimizations and such to cut the loading times and allow inference with much less VRAM. Also I'm loading the clip models from the checkpoint instead of using the original clip models (that's ~12GB less to download).
All this will make it behave a little different from the original, and can't be compared 1:1. I'm still able to get extremely good results, no worse from what I tried with the replicate version.
Comparing SUPIR to other methods is not really a conversation for here though, CCSR and StableSR definitely have their strengths, as does custom workflows tailored towards specific purposes. It's just another tool in the toolbox, it can be used as part of a workflow too, I've seen people combine CCSR and SUPIR upscales in stages already.
Oh yea, wasn't trying to compare - I was just saying I can't confirm that my results have been any different w/ the Comfy implementation. The one thing I did notice in my time playing w/ it so far was related to comparing the results w/ stableSR is all.
Oh yea, wasn't trying to compare - I was just saying I can't confirm that my results have been any different w/ the Comfy implementation. The one thing I did notice in my time playing w/ it so far was related to comparing the results w/ stableSR is all.
Yeah wasn't aimed at you only, whole thing is getting bit sidetracked here. Comparing to other repos and original code is fine and welcomed of course, glad to have your input as well!
What kind of face restoration uses that repo? Because faces are a lot better on that Supir version.
What kind of face restoration uses that repo? Because faces are a lot better on that Supir version.
Seems that's some standard face restorer, it's used after the SUPIR cleanup VAE process (stage1). I will at some point separate the stages to allow utilizing the cleaned images, I don't know how big of a difference it makes though. For similar results in comfy you would use some face restorer model before inputting the image to the SUPIR node for now.
Edit: the code is also in the original repo. I don't know a comfy equilevant node though, but there must be one, it uses this code: https://github.com/xinntao/facexlib
It's not really directly related to SUPIR though.
Forgive me for the basic question, but stage 1 is always enabled for the Comfy node, is that right? Or can it be disabled? That's likely the difference these users are noticing. It seems like stage 1 kinda scrapes (almost blurs) the image a bit so the 2nd stage doesn't enhance the wrong things.
Anyway, I'm stoked to have the ability to use SUPIR in my comfy workflows, so thanks a lot for all your time and effort - It's very much appreciated!!
Forgive me for the basic question, but stage 1 is always enabled for the Comfy node, is that right? Or can it be disabled? That's likely the difference these users are noticing. It seems like stage 1 kinda scrapes (almost blurs) the image a bit so the 2nd stage doesn't enhance the wrong things.
Anyway, I'm stoked to have the ability to use SUPIR in my comfy workflows, so thanks a lot for all your time and effort - It's very much appreciated!!
On the other repo Stage 1 is skipped. Only Stage 2 runs, but before that the image resizes to 1024px. Comfyui Supir runs the stage 1 but we can't see nothing.
Forgive me for the basic question, but stage 1 is always enabled for the Comfy node, is that right? Or can it be disabled? That's likely the difference these users are noticing. It seems like stage 1 kinda scrapes (almost blurs) the image a bit so the 2nd stage doesn't enhance the wrong things.
Anyway, I'm stoked to have the ability to use SUPIR in my comfy workflows, so thanks a lot for all your time and effort - It's very much appreciated!!
Stage1 is encoding the image with their "denoise_encoder", then decoding it. Apparently it "cleans up" the image. Definitely planning to separate it along with lots of stuff in the now huge node.
Those are the stage 1 result of some photos posted here:
As I said, this stage is skipped on the custom Supir repo.
There's also a local copy of the OG Supir Spaces/demo available via Pinokio It also has stage 1 disabled. Using same SDXL base model as Spaces. I can switch to another model if more useful for comparison.
Settings same for all, except final car upscale switched to x2. no additional "umbrella" prompt on 2nd set of images:
{'event_id': '1710128546124846500', 'localtime': 'Mon Mar 11 16:42:26 2024', 'prompt': '', 'a_prompt': 'Cinematic, High Contrast, highly detailed, taken using a Canon EOS R camera, hyper detailed photo - realistic maximum detail, 32k, Color Grading, ultra HD, extreme meticulous detailing, skin pore detailing, hyper sharpness, perfect without deformations', 'n_prompt': 'painting, oil painting, illustration, drawing, art, sketch, oil painting, cartoon, CG Style, 3D render, unreal engine, blurring, dirty, messy, worst quality, low quality, frames, watermark, signature, jpeg artifacts, deformed, lowres, over-smooth', 'num_samples': 1, 'upscale': 4, 'edm_steps': 50, 's_stage1': -1, 's_stage2': 1, 's_cfg': 7.5, 'seed': 454592652, 's_churn': 5, 's_noise': 1.003, 'color_fix_type': 'Wavelet', 'diff_dtype': 'fp16', 'ae_dtype': 'bf16', 'gamma_correction': 1, 'linear_CFG': True, 'linear_s_stage2': False, 'spt_linear_CFG': 4, 'spt_linear_s_stage2': 0, 'model_select': 'v0-Q'}`
OG
x4
OG
x4
OG
x4
x2
Here's the same 3 images upscaled x4 using juggernautXL_v9Rundiffusionphoto2
(still in Pinokio version)
There seems to be confusion as to what "Stage1" refers to. In the gradio demo, when you select stage2, in the code it still runs what is called "stage1", which is the denoising process. Which is what I meant with the node always also using it, it's no different in that than the gradio demo. I tried skipping it and generally the results get really noisy then, it will still be an option with the separated nodes (currently in dev branch).
I've pushed the big update to main where the nodes are separated, at least to me it makes much more sense now and it's easier to tune. Also tuned the sampler and exposed "eta" setting for the RestoreDPMPP2M -sampler, which can have huge effects such as this (8 steps, juggernaut lightning model): eta 1.0 eta 3.0
Wow, too much nodes now. I need a workflow for supir now. Is there a tutorial for this version? I need it.
Wow, too much nodes now. I need a workflow for supir now. Is there a tutorial for this version? I need it.
I did include an example workflow in the examples folder of the node. I haven't had time to make more yet though, and there's still a lot to learn to get best results.
Tested. I can't get better results with the example workflow than the previous update. I keep testing it.
Tested. I can't get better results with the example workflow than the previous update. I keep testing it.
yea, guy is working hard on a bunch of nodes/projects so best is to hold for one of the Youtube Comfy specialists to break it down and figure out the best workflow/parameters (not the dev at this point). Everything should be tweakable now.
Good reddit group and thread already on this split out node - check/ask in here maybe https://old.reddit.com/r/comfyui/comments/1bh07ke/supir_v2_nodes_from_kijai_are_available_on/
Annnnd thanks Kijai for all the time/effort! Also going to test out the img2vid node you have posted.
Tested. I can't get better results with the example workflow than the previous update. I keep testing it.
Better results are not expected as the underlying code has not changed, it's just easier and faster to adjust. I still have zero issues getting incredible results myself overall.
The same problems for me. I get noisy image, while using an online service returns perfect picture. Setting are the same, cfg doesn't help. Any idea?
A couple of days ago I was testing Supir online running in "replicate". At the end of the trial period and after the good results I decided to install it on my PC. I found this version and installed it very easily. It works very well with many images, but with others the quality is much worse than the replicate version or the example images. It especially fails with people's faces (like this moment of this video) or with details that are difficult to see (like this other moment), while the replicate version was almost perfect with faces and small details. I have tried to copy the values of replicate in SUPIR-COMFYUI but the result varies a lot. I see that in replicate it has 2 stages and in COMFYUI it has only 1 stage. As an SDXL model I am using the base SDXL model, but I have also tried Juggernaut XL and RunDiffusion XL, obtaining worse results.
I want to try the gradio version with LLava, but I'm having problems getting it to work.
I have a 4080 16GB VRAM and 32GB RAM. I can render images without problems up to 2500x2500.
What could be the problem?