Open Lex-DRL opened 1 year ago
As an example:
I have an image of a shirtless man standing in front of a dark background. I want to isolate his body to perform an "inpaint" pass after upscale.
However, when I do, FaceDetailer
aggressively tries to make the cropped section of the image rectangular and vertically oriented. Therefore, it crops away one of the guy's arms and detects it a a separate piece, vertical rectangle again. This causes quite a noticeable seam at the line of cut. And all this is happening on RTX 4090, with a plenty of spare vram, so the node should just process this as a single square piece. But it doesn't. And no matter what settings I change, I can't force it to lean towards square aspect ratio.
The oddest thing is, the same node with the same detailer_pipe
connected was successfully selecting the same thing before upscale. But for some reason, not after. And without the docs, I don't even understand why it does that.
I detect it with CLIPSegDetectorProvider
as bbox_detector
.
I understand that the documentation aspect has been weakened due to a focus on development. It's true that documentation is lacking. There are plans to improve it in the future.
And the scenario you described seems like a bug. If I can reproduce it on my end, I'll let you know after the bug has been fixed. What is the upscaled resolution of the image below?
Thanks for the response 🙏🏻
The image is upscaled multiple times, from 768x448. The resolution at which the issue appeared is 3072x1762
Here's the entire workspace for you to reproduce (it uses a male-focused model, though): workflow.zip
The graph might look terrifying at first, but it's quite simple: each horizontal row is cleanup at a specific resolution. All the config is on the left.
Thank you I will check it.
Can you share the upscaled image? (3072x1792). I cannot reproduce with my modeified workflow.
I've changed the graph quite significantly since then. Give me a few minutes, I'll load the workspace I've uploaded here...
Here it is: ComfyUI_temp_pzprm00004.zip
However, it seems I've changed some parameters since the last time I made the screenshot. The seam is still there, bit it's moved (it's not across bicep now, but instead along forearm, therefore it's less visible, but it's still there):
I'll check my browser history to see if I can find that one, with the clearly visible issue.
Yeah. Found it and re-generated it just now: ComfyUI_temp_chdeo00001.zip
I cannot reproduce this.
NOTE: SimpleDetector is patched. So you need to update Impact Pack if you try this workflow.
Hmmm... I've updated everything yesterday evening. No node updates since then, only a few commints on the main ComfyUI repo.
Let me isolate the issue as much as possible... I'll try to upload an image just before the problematic node.
Sooo... Just in case, I've updated ComfyUI, too.
Here's the isolated part of the graph. Can you reproduce the issue at least with it? crop_issue.zip
UPD: I tried it without lora, with "b"/"l" versions of SAM model, with different checkpoints, even on the default SD 1.5. I still have the issue even in the most barebones network. workflow-simplest.json.txt
UPD#2:
Just tried your workflow-4th.json.txt
with the image I uploaded here. And I got:
Error occurred when executing ImpactSimpleDetectorSEGSPipe:
too many values to unpack (expected 9)
File "E:\SD\ComfyUI-portable\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "E:\SD\ComfyUI-portable\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "E:\SD\ComfyUI-portable\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "E:\SD\ComfyUI-portable\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\detectors.py", line 233, in doit model, clip, vae, positive, negative, wildcard, bbox_detector, segm_detector_opt, sam_model_opt = detailer_pipe
Sooo... Just in case, I've updated ComfyUI, too.
Here's the isolated part of the graph. Can you reproduce the issue at least with it? crop_issue.zip
- I've saved the same source image twice: one manually, the other one with "SaveImage" node.
- The archive contains a whole console output since ComfyUI launch, too.
- I've just changed checkpoint to Reliberate v2 - the issue is still there.
UPD: I tried it without lora, with "b"/"l" versions of SAM model, with different checkpoints, even on the default SD 1.5. I still have the issue even in the most barebones network. workflow-simplest.json.txt
UPD#2: Just tried your
workflow-4th.json.txt
with the image I uploaded here. And I got:Error occurred when executing ImpactSimpleDetectorSEGSPipe: too many values to unpack (expected 9) File "E:\SD\ComfyUI-portable\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "E:\SD\ComfyUI-portable\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "E:\SD\ComfyUI-portable\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "E:\SD\ComfyUI-portable\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\detectors.py", line 233, in doit model, clip, vae, positive, negative, wildcard, bbox_detector, segm_detector_opt, sam_model_opt = detailer_pipe
I'll try reproduce after get home.
I was able to successfully reproduce the error situation. Everything is normal until it's converted to SEGS, and it's clear that the problem occurred in FaceDetailer.
Yaay! Good to know this elusive bug is narrowed down. 🤝
The lack of documentation is still there, though. So, I guess, this issue should stay open until it's resolved (maybe other people would contribute).
As for cropping bug, if you'll create a separate issue to keep track of it, please mention me - so I'll subscribe to it, too.
oh... my mistake... It's not the facedetailer problem. It's crop_factor problem. The phenomenon occurs when crop_factor is 2.0
Upon closer inspection, I realized that this wasn't a bug. I misunderstood because this approach isn't commonly used.
The reason for combining BBox Detector and SAM is typically to extract only the silhouette of the face. BBox Detector captures the bounding box area of the face, whereas SAM captures the entire person.
So, by obtaining the intersection of these two, you can achieve a detailed facial silhouette.
The issue is that when capturing areas beyond the face, unlike the usual case where the neck might be cut off, it looks odd when parts of the body are cropped out abruptly.
To address this phenomenon, you need to significantly increase the crop_factor of the bbox, altering the region that gets cropped out by SAM. Alternatively, you can adjust the parameters of CLIPSeg to encompass the entire region of interest, ensuring that the detection scope covers that specific area comprehensively.
Upon closer inspection, I realized that this wasn't a bug. I misunderstood because this approach isn't commonly used.
The reason for combining BBox Detector and SAM is typically to extract only the silhouette of the face. BBox Detector captures the bounding box area of the face, whereas SAM captures the entire person.
So, by obtaining the intersection of these two, you can achieve a detailed facial silhouette.
The issue is that when capturing areas beyond the face, unlike the usual case where the neck might be cut off, it looks odd when parts of the body are cropped out abruptly.
To address this phenomenon, you need to significantly increase the crop_factor of the bbox, altering the region that gets cropped out by SAM. Alternatively, you can adjust the parameters of CLIPSeg to encompass the entire region of interest, ensuring that the detection scope covers that specific area comprehensively.
Try increase blur of CLIPSegDetectorProvider to 5. By doing so, you can prevent the occurrence of fragmented body area masks and ensure a more seamless result.
The phenomenon occurs when crop_factor is 2.0
2 specifically? Or anything below 2? Or above? For hands fix, I have 3, 4 and even 6 - to give the model a bigger chunk of image for context about body size / proportions.
BBox Detector captures the bounding box area of the face, whereas SAM captures the entire person.
Since I wasn't able to find it in the docs, my current understanding of what the entire node pack does is very vague, I've just followed examples shown in the tutorials. Currently, I don't even understand why do I need SAM model (or do I?) and how bbox_detector is different from segm_detector, when should I use one and/or the other.
If I get it right, with CLIPSegDetectorProvider
I'm not limited to selecting a face / an entire person (or any other thing that requires a pre-trained model for UltralyticsDetectorProvider
). I can select basically anything for refinement.
If I get it right again, the whole ...Detailer setup works by:
But if so, why blur is there twice then? On CLIPSegDetectorProvider
and on FaceDetailer
. I assumed setting blur to anything above 0 in CLIPSegDetectorProvider
will effectively blur the mask twice before passing it to inpaint. So I've set it to 0, to control overall blur in a single place.
Try increase blur of CLIPSegDetectorProvider to 5.
I can, but... since it's unclear how exactly CLIPSegDetectorProvider
behaves and what it's parameters do, this might solve this specific case, but I still won't understand HOW it solved it. For now, should I just always set blur
to 5 or the default 7 as a magic number?
And for future, this raises up the issue of missing docs.
Since I wasn't able to find it in the docs, my current understanding of what the entire node pack does is very vague, I've just followed examples shown in the tutorials. Currently, I don't even understand why do I need SAM model (or do I?) and how bbox_detector is different from segm_detector, when should I use one and/or the other.
If I get it right, with
CLIPSegDetectorProvider
I'm not limited to selecting a face / an entire person (or any other thing that requires a pre-trained model forUltralyticsDetectorProvider
). I can select basically anything for refinement.
Initially, the approach involved working separately on detection functionality through the detector, mask improvement through mask manipulation, and detailing functionality. However, this was later streamlined into a single node called "FaceDetailer," simplifying the frequently used pattern.
While FaceDetailer is convenient to use, it can obstruct understanding of its internal workings. It appears that guiding beginners to use FaceDetailer as a preset by hiding the existing approach might have been a mistake on my part, as it might seem overly complex for newcomers.
Looking at this tutorial should make it easier to understand each parameter: https://raw.githubusercontent.com/ltdrdata/ComfyUI-extension-tutorials/Main/ComfyUI-Impact-Pack/images/sam.png
There are various nodes related to SEGS, so I recommend using them to directly inspect the output from the detector. This will help you gain a better understanding.
The reason for applying blur in CLIPSeg is to ensure that adjacent fragmented masks become connected during the blurring process, treating them as a single mask entity. Once they form a cohesive unit, MaskToSEGS recognizes these masks as a singular segment and generates the corresponding SEG. Additionally, the criterion for the crop area is based on the individual masks recognized as cohesive units.
The blurring of the mask itself in CLIPSeg isn't crucial since the mask will be binary eventually. What matters is making sure the mask is recognized as a single cohesive entity.
And as for the parameters of CLIPSegDetectorProvider, it's recommended to try using CLIPSeg directly. CLIPSegDetectorProvider is essentially a wrapper for CLIPSeg.
Looking at this tutorial should make it easier to understand each parameter: https://raw.githubusercontent.com/ltdrdata/ComfyUI-extension-tutorials/Main/ComfyUI-Impact-Pack/images/sam.png
I've looked at it here (didn't downloaded the workflow itself, though). Seems like I need to start from scratch, learning the theory of what each node does first.
CLIPSegDetectorProvider is essentially a wrapper for CLIPSeg.
Yeah, I got that. Unfortunately, CLIPSeg
isn't documented well, too. I can't even find which space is this blur performed in.
Anyway, thanks for your guidance. The only thing clear to me now is that nothing is clear (not even things that I thought I had a grasp of) and that I need to go back to step 1.
Currently, a significant part of nodes/parameters aren't documented at all. E.g., in the main
FaceDetailer
node it's unclear what a half of all the parameters does (everything afterfeather
). Similarly, it's never explained how oneDetailer
orDetector
compares to the other (SAMDetector vs Simple Detector - which is the preferred one by default and when should we use the other?).With the current state of docs, it's basically impossible to learn what exactly a specific node/parameter does and why the node might do something unexpected.