htyjers / StrDiffusion

[CVPR 2024] Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

Apache License 2.0

85 stars 11 forks source link

您好，使用您的三个训练后，进行测试，但输出严重伪影 #40

Closed krantbrity closed 2 months ago

htyjers commented 3 months ago

您好，根据您提供的二维码图像修复任务，由于您提供的二维码只有黑白2种颜色，对二维码图像分为结构structure和texture语义信息并没有太大的意义，我推荐您只采用我们单独的structure模型就可以进行较好的修复了，因为structure更偏向于物体结构信息的修复，如下图所示，我只利用structure网络就已经可以取得较好的结构，如果进一步利用texture网络，反而会造成干扰。

其次，我利用places预训练模型进行了测试，如果加入语义或者颜色，此时，加入texture信息到二维码图像修复任务中应该能发挥一定的作用，如下图所示

krantbrity commented 3 months ago

非常感谢您详细的回复！不胜感激，但是我使用了另外的数据集，发现仍然存在修复不了的情况，是否会与数据集大小有关系，我的训练集、测试集和验证集分别为2722、840和980

krantbrity commented 3 months ago

问题有可能是出在StrDiffusion-main/train/texture/config/inpainting/data/GT_dataset.py 中：

get GT image

    GT_path = self.GT_paths[index]
    if self.opt["data_type"] == "lmdb":
        resolution = [int(s) for s in self.GT_sizes[index].split("_")]
    else:
        resolution = None
    img_GT = util.read_img(
        self.GT_env, GT_path, resolution
    )  # return: Numpy float32, HWC, BGR, [0,1]

    if self.opt["phase"] == "train":
        H, W, C = img_GT.shape

        rnd_h = random.randint(0, max(0, H - GT_size))
        rnd_w = random.randint(0, max(0, W - GT_size))
        img_GT = img_GT[rnd_h : rnd_h + GT_size, rnd_w : rnd_w + GT_size, :]
        img_GT = cv2.resize(img_GT, (GT_size, GT_size), interpolation=cv2.INTER_AREA)

我最后加的这句img_GT = cv2.resize(img_GT, (GT_size, GT_size), interpolation=cv2.INTER_AREA)上吗

htyjers commented 3 months ago

数据集应该是太小了，图像修复常用的最小数据集PSV，训练集也有14900张图片
在这里就可以设置图像的大小，img_GT = cv2.resize(img_GT, (GT_size, GT_size), interpolation=cv2.INTER_AREA)这句话我感觉好像没有必要加上

krantbrity commented 3 months ago

感谢您的指导，我再用足够数量的训练集试一试