pydn / ComfyUI-to-Python-Extension

A powerful tool that translates ComfyUI workflows into executable Python code.
MIT License
1.12k stars 116 forks source link

Free memory when already run the prompt #83

Open Toldblog opened 1 week ago

Toldblog commented 1 week ago

Since I don't have a GPU on my computer, I need to convert the workflow into Python code and run it on Colab. However, when I run the code multiple times to generate images, I run into issues with running out of RAM. It seems like the program reloads some models each time the code is executed, which leads to the RAM being exhausted in Colab.

While reviewing the source code in the ComfyUI repository, I found a free_memory function located in /ComfyUI-master/comfy/model_management.py. I'm unsure if this could help.

To clarify, I don't have issues with models like ControlNet or CLIP being loaded, as they are necessary for the workflow. However, there are some other models that are being loaded during the generation phase, which is causing confusion.

Here are my codes

 @title Loading Resources
  vaeloader = NODE_CLASS_MAPPINGS["VAELoader"]()
  checkpointloadersimple = NODE_CLASS_MAPPINGS["CheckpointLoaderSimple"]()
  cliptextencode = NODE_CLASS_MAPPINGS["CLIPTextEncode"]()
  loadimage = NODE_CLASS_MAPPINGS["LoadImage"]()
  controlnetloader = NODE_CLASS_MAPPINGS["ControlNetLoader"]()
  ipadaptermodelloader = NODE_CLASS_MAPPINGS["IPAdapterModelLoader"]()
  clipvisionloader = NODE_CLASS_MAPPINGS["CLIPVisionLoader"]()
  samloader = NODE_CLASS_MAPPINGS["SAMLoader"]()
  ultralyticsdetectorprovider = NODE_CLASS_MAPPINGS["UltralyticsDetectorProvider"]()
  groundingdinomodelloader_segment_anything = NODE_CLASS_MAPPINGS["GroundingDinoModelLoader (segment anything)"]()
  groundingdinosamsegment_segment_anything = NODE_CLASS_MAPPINGS["GroundingDinoSAMSegment (segment anything)"]()
  imagescaletototalpixels = NODE_CLASS_MAPPINGS["ImageScaleToTotalPixels"]()
  getimagesize = NODE_CLASS_MAPPINGS["GetImageSize"]()
  emptylatentimage = NODE_CLASS_MAPPINGS["EmptyLatentImage"]()
  prepimageforclipvision = NODE_CLASS_MAPPINGS["PrepImageForClipVision"]()
  ipadapteradvanced = NODE_CLASS_MAPPINGS["IPAdapterAdvanced"]()
  freeu_v2 = NODE_CLASS_MAPPINGS["FreeU_V2"]()
   dwpreprocessor = NODE_CLASS_MAPPINGS["DWPreprocessor"]()
  controlnetapplyadvanced = NODE_CLASS_MAPPINGS["ControlNetApplyAdvanced"]()
  cr_model_input_switch = NODE_CLASS_MAPPINGS["CR Model Input Switch"]()
  ksampleradvanced = NODE_CLASS_MAPPINGS["KSamplerAdvanced"]()
  nnlatentupscale = NODE_CLASS_MAPPINGS["NNLatentUpscale"]()
  ksampler = NODE_CLASS_MAPPINGS["KSampler"]()
  vaedecode = NODE_CLASS_MAPPINGS["VAEDecode"]()
  bboxdetectorsegs = NODE_CLASS_MAPPINGS["BboxDetectorSEGS"]()
  samdetectorcombined = NODE_CLASS_MAPPINGS["SAMDetectorCombined"]()
  impactsegsandmask = NODE_CLASS_MAPPINGS["ImpactSegsAndMask"]()
  conditioningcombine = NODE_CLASS_MAPPINGS["ConditioningCombine"]()
  detailerforeachdebug = NODE_CLASS_MAPPINGS["DetailerForEachDebug"]()
  saveimage = NODE_CLASS_MAPPINGS["SaveImage"]()
  catvtonwrapper = NODE_CLASS_MAPPINGS["CatVTONWrapper"]()

  vaeloader_8 = vaeloader.load_vae(vae_name="vae-ft-mse-840000-ema-pruned.safetensors")
  checkpointloadersimple_16 = checkpointloadersimple.load_checkpoint(ckpt_name="realdream.safetensors")
  controlnetloader_156 = controlnetloader.load_controlnet(control_net_name="control_v11p_sd15_openpose.pth")
  ipadaptermodelloader_256 = ipadaptermodelloader.load_ipadapter_model(ipadapter_file="ip-adapter-full-face_sd15.safetensors")
  clipvisionloader_257 = clipvisionloader.load_clip(clip_name="model.safetensors")
  ultralyticsdetectorprovider_266 = ultralyticsdetectorprovider.doit(model_name="bbox/face_yolov8m.pt")
  samloader_268 = samloader.load_model(model_name="sam_vit_b_01ec64.pth", device_mode="Prefer GPU")
  cliptextencode_274 = cliptextencode.encode(text="a face", clip=get_value_at_index(checkpointloadersimple_16, 1))
  groundingdinomodelloader_segment_anything_306 = (groundingdinomodelloader_segment_anything.main(model_name="GroundingDINO_SwinT_OGC (694MB)"))
  samloader_307 = samloader.load_model(model_name="sam_vit_b_01ec64.pth", device_mode="Prefer GPU")

  sys.path.append("ComfyUI-master/custom_nodes/comfyui_controlnet_aux/src")
  from custom_controlnet_aux.dwpose import DwposeDetector, AnimalposeDetector
  sys.path.append("ComfyUI-master")
  import comfy.model_management as model_management

  sys.path.append("ComfyUI-master/custom_nodes/comfyui_controlnet_aux")
  from comfyui_controlnet_aux.utils import common_annotator_call, define_preprocessor_inputs, INPUT
  import json

  bbox_detector="yolox_l.onnx"
  pose_estimator="dw-ll_ucoco_384.onnx"
  yolo_repo="yzd-v/DWPose"
  pose_repo="yzd-v/DWPose"
  DWPOSE_MODEL_NAME = "yzd-v/DWPose"

  model = DwposeDetector.from_pretrained(
          pose_repo,
          yolo_repo,
          det_filename=bbox_detector, pose_filename=pose_estimator,
          torchscript_device=model_management.get_torch_device()
      )

  class DWPose_Preprocessor:
      @classmethod
      def INPUT_TYPES(s):
          return define_preprocessor_inputs(
              detect_hand=INPUT.COMBO(["enable", "disable"]),
              detect_body=INPUT.COMBO(["enable", "disable"]),
              detect_face=INPUT.COMBO(["enable", "disable"]),
              resolution=INPUT.RESOLUTION(),
              bbox_detector=INPUT.COMBO(
                  ["yolox_l.torchscript.pt", "yolox_l.onnx", "yolo_nas_l_fp16.onnx", "yolo_nas_m_fp16.onnx", "yolo_nas_s_fp16.onnx"],
                  default="yolox_l.onnx"
              ),
              pose_estimator=INPUT.COMBO(
                  ["dw-ll_ucoco_384_bs5.torchscript.pt", "dw-ll_ucoco_384.onnx", "dw-ll_ucoco.onnx"],
                  default="dw-ll_ucoco_384_bs5.torchscript.pt"
              ),
              scale_stick_for_xinsr_cn=INPUT.COMBO(["disable", "enable"])
          )

      RETURN_TYPES = ("IMAGE", "POSE_KEYPOINT")
      FUNCTION = "estimate_pose"

      CATEGORY = "ControlNet Preprocessors/Faces and Poses Estimators"

      def estimate_pose(self, image, detect_hand="enable", detect_body="enable", detect_face="enable", resolution=512, model=None, scale_stick_for_xinsr_cn="disable", **kwargs):
          detect_hand = detect_hand == "enable"
          detect_body = detect_body == "enable"
          detect_face = detect_face == "enable"
          scale_stick_for_xinsr_cn = scale_stick_for_xinsr_cn == "enable"
          self.openpose_dicts = []
          def func(image, **kwargs):
              pose_img, openpose_dict = model(image, **kwargs)
              self.openpose_dicts.append(openpose_dict)
              return pose_img

          out = common_annotator_call(func, image, include_hand=detect_hand, include_face=detect_face, include_body=detect_body, image_and_json=True, resolution=resolution, xinsr_stick_scaling=scale_stick_for_xinsr_cn)
          del model
          return {
              'ui': { "openpose_json": [json.dumps(self.openpose_dicts, indent=4)] },
              "result": (out, self.openpose_dicts)
          }

  dwpreprocessor = DWPose_Preprocessor()

I dont have any concern or issues with these load

[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json INFO:root:Using pytorch attention in VAE Using pytorch attention in VAE INFO:root:Using pytorch attention in VAE Using pytorch attention in VAE INFO:root:model weight dtype torch.float16, manual cast: None model weight dtype torch.float16, manual cast: None INFO:root:model_type EPS model_type EPS INFO:root:Using pytorch attention in VAE Using pytorch attention in VAE INFO:root:Using pytorch attention in VAE Using pytorch attention in VAE /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( INFO:root:loaded straight to GPU loaded straight to GPU INFO:root:Requested to load BaseModel Requested to load BaseModel INFO:root:Loading 1 new model Loading 1 new model INFO:root:loaded completely 0.0 1639.406135559082 True loaded completely 0.0 1639.406135559082 True /usr/local/lib/python3.10/dist-packages/ultralytics/nn/tasks.py:836: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.mduntrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. ckpt = torch.load(file, map_location="cpu") /usr/local/lib/python3.10/dist-packages/segment_anything/build_sam.py:105: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.mduntrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(f) INFO:root:Requested to load SD1ClipModel Requested to load SD1ClipModel INFO:root:Loading 1 new model Loading 1 new model INFO:root:loaded completely 0.0 235.84423828125 True loaded completely 0.0 235.84423828125 True Loads SAM model: ComfyUI-master/models/sams/sam_vit_b_01ec64.pth (device:Prefer GPU) /usr/local/lib/python3.10/dist-packages/torch/functional.py:513: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3609.) return _VF.meshgrid(tensors, **kwargs) type: ignore[attr-defined] final text_encoder_type: bert-base-uncased tokenizer_config.json: 100%  48.0/48.0 [00:00<00:00, 2.65kB/s] config.json: 100%  570/570 [00:00<00:00, 37.0kB/s] vocab.txt: 100%  232k/232k [00:00<00:00, 5.08MB/s] tokenizer.json: 100%  466k/466k [00:00<00:00, 3.15MB/s] model.safetensors: 100%  440M/440M [00:07<00:00, 90.2MB/s] ComfyUI-master/custom_nodes/comfyui_segment_anything/node.py:127: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.mduntrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. checkpoint = torch.load( Loads SAM model: ComfyUI-master/models/sams/sam_vit_b_01ec64.pth (device:Prefer GPU) model_path is ComfyUI-master/custom_nodes/comfyui_controlnet_aux/ckpts/yzd-v/DWPose/yolox_l.onnx model_path is ComfyUI-master/custom_nodes/comfyui_controlnet_aux/ckpts/yzd-v/DWPose/dw-ll_ucoco_384.onnx

DWPose: Using yolox_l.onnx for bbox detection and dw-ll_ucoco_384.onnx for pose estimation DWPose: Caching OpenCV DNN module yolox_l.onnx on cv2.DNN... DWPose: Caching OpenCV DNN module dw-ll_ucoco_384.onnx on cv2.DNN...

@title Inference Code
def inference_code():
     Enter inference mode (turns off gradient calculations for memory optimization)
    with torch.inference_mode():

        images = None

         Encode the negative prompt text into CLIP text embeddings
         This is used to suppress unwanted features in the final image
        cliptextencode_18 = cliptextencode.encode(
            text=negative_prompt,   The negative text prompt (what should be avoided in the image)
            clip=get_value_at_index(checkpointloadersimple_16, 1),   Load the CLIP text encoder from the checkpoint
        )

         Encode the positive prompt text into CLIP text embeddings
         This encourages the model to include features described by the positive prompt
        cliptextencode_30 = cliptextencode.encode(
            text=prompt,   The positive text prompt (what should be included in the image)
            clip=get_value_at_index(checkpointloadersimple_16, 1),   Load the CLIP text encoder from the checkpoint
        )

         Load the pose image (which may provide body positioning) from the specified path
        loadimage_155 = loadimage.load_image(image=f"{pose}")

         Load the face image, used to condition the generation on specific facial features
        loadimage_161 = loadimage.load_image(image=f"{face}")

         Load the clothes image, which may represent the outfit to be worn by the subject
        loadimage_304 = loadimage.load_image(image=f"{clothes}")

         Loop over the number of iterations (currently set to 1 iteration, meaning no real loop here)
        for q in range(1):

             Upscale the pose image to a total size of 0.5 megapixels using nearest neighbor scaling
             This step adjusts the image resolution for further processing
            imagescaletototalpixels_168 = imagescaletototalpixels.upscale(
                upscale_method="nearest-exact",   Use nearest-neighbor scaling for the image
                megapixels=0.5,   Set target resolution to 0.5 megapixels
                image=get_value_at_index(loadimage_155, 0),   Load the first item from the pose image batch
            )

             Get the size (width, height) of the upscaled pose image for further use
            getimagesize_208 = getimagesize.get_size(
                image=get_value_at_index(imagescaletototalpixels_168, 0)   Use the upscaled pose image
            )

             Generate an empty latent image with the same dimensions as the upscaled image
             The latent image acts as a placeholder for generating the new image in latent space
            emptylatentimage_21 = emptylatentimage.generate(
                width=get_value_at_index(getimagesize_208, 0),   Use the width of the upscaled image
                height=get_value_at_index(getimagesize_208, 1),   Use the height of the upscaled image
                batch_size=1,   Generate a single latent image
            )

             Prepare the face image for use with CLIP Vision by resizing and sharpening it
             This step ensures the face image is compatible with the model’s expectations
            prepimageforclipvision_259 = prepimageforclipvision.prep_image(
                interpolation="LANCZOS",   Use Lanczos interpolation for resizing (high-quality method)
                crop_position="center",   Center the crop around the face
                sharpening=0,   No additional sharpening is applied
                image=get_value_at_index(loadimage_161, 0),   Load the first face image
            )

             Apply the IP-Adapter model to the prepared face image
             The IP-Adapter adapts the input image by fusing visual and textual embeddings
            ipadapteradvanced_303 = ipadapteradvanced.apply_ipadapter(
                weight=1,   Weight for the adaptation process (how strongly to apply this transformation)
                weight_type="linear",   Use a linear weighting scheme
                combine_embeds="concat",   Combine the embeddings by concatenation (text and image)
                start_at=0,   Start adapting from the first layer of the model
                end_at=1,   End adapting after one layer
                embeds_scaling="V only",   Only scale the embeddings from the Vision model (CLIP Vision)
                model=get_value_at_index(checkpointloadersimple_16, 0),   Main generation model (from the checkpoint)
                ipadapter=get_value_at_index(ipadaptermodelloader_256, 0),   IP-Adapter model
                image=get_value_at_index(prepimageforclipvision_259, 0),   The prepared face image
                clip_vision=get_value_at_index(clipvisionloader_257, 0),   The CLIP Vision model
            )

             Apply the FreeU-V2 model, which enhances the image using patch-based processing
             This model applies transformations to specific parts of the image (patches) for refinement
            freeu_v2_252 = freeu_v2.patch(
                b1=1.3,   Parameter controlling some aspect of patch processing
                b2=1.4,   Another parameter for patch processing
                s1=0.9,   Scaling factor for patches
                s2=0.2,   Scaling factor for another stage of the patch process
                model=get_value_at_index(ipadapteradvanced_303, 0),   Apply FreeU-V2 on the adapted model output
            )

             Estimate the human pose (body, face, hands) from the upscaled pose image
             This provides key points for the body and helps condition the image generation process
            dwpreprocessor_238 = dwpreprocessor.estimate_pose(
                detect_hand="enable",   Enable detection of hand key points
                detect_body="enable",   Enable detection of body key points
                detect_face="enable",   Enable detection of face key points
                resolution=512,   Set resolution for pose detection
                model=model,   Use the provided model for pose estimation
                scale_stick_for_xinsr_cn="disable",   Disable scaling for stick figure models
                image=get_value_at_index(imagescaletototalpixels_168, 0),   The upscaled pose image
            )

             Apply ControlNet for advanced conditioning of the image
            controlnetapplyadvanced_240 = controlnetapplyadvanced.apply_controlnet(
                strength=1, start_percent=0, end_percent=1,
                positive=get_value_at_index(cliptextencode_30, 0),
                negative=get_value_at_index(cliptextencode_18, 0),
                control_net=get_value_at_index(controlnetloader_156, 0),
                image=get_value_at_index(dwpreprocessor_238, 0)
            )

             Switch between two models for further image processing
            cr_model_input_switch_291 = cr_model_input_switch.switch(
                Input=2,   Model 2 is selected
                model1=get_value_at_index(checkpointloadersimple_16, 0),
                model2=get_value_at_index(freeu_v2_252, 0)
            )

             Perform image sampling with advanced settings, applying noise and steps configuration
            ksampleradvanced_253 = ksampleradvanced.sample(
                add_noise="enable", noise_seed=random.randint(1, 2**64), steps=20, cfg=5.5,
                sampler_name="heunpp2", scheduler="karras", start_at_step=0, end_at_step=10000,
                return_with_leftover_noise="disable", model=get_value_at_index(cr_model_input_switch_291, 0),
                positive=get_value_at_index(controlnetapplyadvanced_240, 0),
                negative=get_value_at_index(controlnetapplyadvanced_240, 1),
                latent_image=get_value_at_index(emptylatentimage_21, 0)
            )

             Upscale the latent image using a neural network-based latent space upscaling
            nnlatentupscale_263 = nnlatentupscale.upscale(
                version="SD 1.x", upscale=1.5, latent=get_value_at_index(ksampleradvanced_253, 0)
            )

             Perform image sampling again with different settings, using "uni_pc" sampler
            ksampler_176 = ksampler.sample(
                seed=random.randint(1, 2**64), steps=16, cfg=5.6,
                sampler_name="uni_pc", scheduler="karras", denoise=0.25,
                model=get_value_at_index(freeu_v2_252, 0),
                positive=get_value_at_index(controlnetapplyadvanced_240, 0),
                negative=get_value_at_index(controlnetapplyadvanced_240, 1),
                latent_image=get_value_at_index(nnlatentupscale_263, 0)
            )

             Upscale the latent image again for better resolution
            nnlatentupscale_298 = nnlatentupscale.upscale(
                version="SD 1.x", upscale=1.5, latent=get_value_at_index(ksampler_176, 0)
            )

             Final sampling with DPM++ 2M sampler, which adds additional refinement to the image
            ksampler_297 = ksampler.sample(
                seed=random.randint(1, 2**64), steps=14, cfg=5.5,
                sampler_name="dpmpp_2m", scheduler="karras", denoise=0.35,
                model=get_value_at_index(freeu_v2_252, 0),
                positive=get_value_at_index(controlnetapplyadvanced_240, 0),
                negative=get_value_at_index(controlnetapplyadvanced_240, 1),
                latent_image=get_value_at_index(nnlatentupscale_298, 0)
            )

             Decode the final latent representation into an actual image using VAE
            vaedecode_301 = vaedecode.decode(
                samples=get_value_at_index(ksampler_297, 0),
                vae=get_value_at_index(vaeloader_8, 0)
            )

             Detect bounding boxes in the decoded image for further segmentation using a threshold and dilation
            bboxdetectorsegs_267 = bboxdetectorsegs.doit(
                threshold=0.52, dilation=10, crop_factor=1.2, drop_size=10, labels="all",
                bbox_detector=get_value_at_index(ultralyticsdetectorprovider_266, 0),
                image=get_value_at_index(vaedecode_301, 0)
            )

             Use SAM model to generate detailed segmentations based on detected bounding boxes and masks
            samdetectorcombined_269 = samdetectorcombined.doit(
                detection_hint="mask-points", dilation=0, threshold=0.94, bbox_expansion=0,
                mask_hint_threshold=0.7, mask_hint_use_negative="False",
                sam_model=get_value_at_index(samloader_268, 0),
                segs=get_value_at_index(bboxdetectorsegs_267, 0),
                image=get_value_at_index(vaedecode_301, 0)
            )

             Combine the detected bounding boxes and mask to create impact segments for the next steps
            impactsegsandmask_278 = impactsegsandmask.doit(
                segs=get_value_at_index(bboxdetectorsegs_267, 0),
                mask=get_value_at_index(samdetectorcombined_269, 0)
            )

             Combine different conditioning inputs (from text and ControlNet) for further fine-tuning
            conditioningcombine_275 = conditioningcombine.combine(
                conditioning_1=get_value_at_index(cliptextencode_274, 0),
                conditioning_2=get_value_at_index(controlnetapplyadvanced_240, 0)
            )

             Apply additional refinement to the generated image by using a debug detailer, applying noise, and inpainting
            detailerforeachdebug_270 = detailerforeachdebug.doit(
                guide_size=1024, guide_size_for=False, max_size=1024, seed=random.randint(1, 2**64),
                steps=16, cfg=8.5, sampler_name="ddpm", scheduler="karras", denoise=0.3,
                feather=6, noise_mask=True, force_inpaint=True, wildcard="", cycle=1,
                inpaint_model=False, noise_mask_feather=20,
                image=get_value_at_index(vaedecode_301, 0),
                segs=get_value_at_index(impactsegsandmask_278, 0),
                model=get_value_at_index(checkpointloadersimple_16, 0),
                clip=get_value_at_index(checkpointloadersimple_16, 1),
                vae=get_value_at_index(vaeloader_8, 0),
                positive=get_value_at_index(conditioningcombine_275, 0),
                negative=get_value_at_index(controlnetapplyadvanced_240, 1)
            )

             Change Clothes Operation
             Using Grounding DINO and SAM to segment specific areas based on a prompt and apply new clothes
            groundingdinosamsegment_segment_anything_305 = groundingdinosamsegment_segment_anything.main(
                prompt=f"{replace_prompt}", threshold=0.3,
                sam_model=get_value_at_index(samloader_307, 0),
                grounding_dino_model=get_value_at_index(groundingdinomodelloader_segment_anything_306, 0),
                image=get_value_at_index(detailerforeachdebug_270, 0)
            )

             Use CAT-VTON to change the clothes based on the detected segmentation mask and a reference image
            catvtonwrapper_308 = catvtonwrapper.catvton(
                mask_grow=25, mixed_precision="fp16", seed=random.randint(1, 2**64), steps=40, cfg=2.5,
                image=get_value_at_index(detailerforeachdebug_270, 0),
                mask=get_value_at_index(groundingdinosamsegment_segment_anything_305, 1),
                refer_image=get_value_at_index(loadimage_304, 0)
            )

             Save the intermediate image after the debug detailing process
            saveimage_277 = saveimage.save_images(
                filename_prefix="Reposer_S2_Facefix",
                images=get_value_at_index(detailerforeachdebug_270, 0)
            )

             Save the final image after applying the clothes change
            saveimage_309 = saveimage.save_images(
                filename_prefix="ChangeClothes",
                images=get_value_at_index(catvtonwrapper_308, 0)
            )

            return get_value_at_index(catvtonwrapper_308, 0)

obj = inference_code()

I believe that these model loadings are likely causing the RAM issue when I run the inference_code function multiple times.

warnings.warn( INFO: root: Requested to load CLIPVisionModelProjection Requested to load CLIPVisionModelProjection INFO: root: Loading 1 new model Loading 1 new model INFO: root: loaded completely 0.0 1208.09814453125 True loaded completely 0.0 1208.09814453125 True DWPose: Bbox 9025.43ms DWPose: Pose 706.29ms on 1 people

INFO: root: Requested to load ControlNet Requested to load ControlNet INFO: root: Requested to load BaseModel Requested to load BaseModel INFO: root: Loading 2 new models Loading 2 new models INFO: root: loaded completely 0.0 689.0852355957031 True loaded completely 0.0 689.0852355957031 True 100 %  20/20 [00:31 < 00:00, 1.18s/it] 100 %  16/16 [00:30 < 00:00, 1.81s/it] 100 %  14/14 [01:55 < 00:00, 8.23s/it] INFO: root: Requested to load AutoencoderKL Requested to load AutoencoderKL INFO: root: Loading 1 new model Loading 1 new model INFO: root: loaded completely 0.0 319.11416244506836 True loaded completely 0.0 319.11416244506836 True

0: 640x448 1 face, 64.4ms Speed: 7.7ms preprocess, 64.4ms inference, 98.4ms postprocess per image at shape(1, 3, 640, 448) Detailer: segment upscale for ((152.1004, 194.72061)) | crop region(182, 233) x 4.398020300310377 -> (800, 1024) INFO: root: Requested to load ControlNet Requested to load ControlNet INFO: root: Requested to load BaseModel Requested to load BaseModel INFO: root: Loading 2 new models Loading 2 new models INFO: root: loaded completely 0.0 689.0852355957031 True loaded completely 0.0 689.0852355957031 True INFO: root: loaded completely 0.0 1639.406135559082 True loaded completely 0.0 1639.406135559082 True 100 %  16/16 [00:20 < 00:00, 1.29s/it] / usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py: 1126: FutureWarning: The device argument is deprecated and will be removed in v5 of Transformers. warnings.warn( / usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py: 600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants. return fn(*args, **kwargs) / usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py: 92: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( / ComfyUI-master/custom_nodes/comfyui_segment_anything/local_groundingdino/models/GroundingDINO/transformer.py: 862: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead. with torch.cuda.amp.autocast(enabled=False): 100 % |██████████| 40/40 [01:23 < 00:00, 2.09s/it] 😺dzNodes: LayerStyle -> CatVTON_Wrapper Processed.

Is there anyone who can help me with freeing memory or unloading the models that are loaded during the generation step when I run the inference_code function multiple times?

RaidenE1 commented 3 days ago

Hi, I've met the same problem. The job is done but the memory is still in use, and eventually cause OOM and CUDA out of memory.

RaidenE1 commented 3 days ago

I've tried the unload_all_models() function, but it can only free part of the memory.