ziqihuangg / Collaborative-Diffusion

[CVPR 2023] Collaborative Diffusion
https://ziqihuangg.github.io/projects/collaborative-diffusion.html
Other
399 stars 31 forks source link

How to create my own dataset #37

Open shenwuyue2022 opened 4 months ago

shenwuyue2022 commented 4 months ago

How to create text、mask、sketch about my own image

ziqihuangg commented 3 months ago

Hi, for our work, we use the multi-modal labels provided by datasets that are built on top of CelebA, for example CelebA-Dialog, CelebAMask-HQ, and Multi-Modal-CelebA-HQ. If you wish to extract new multi-modal labels, you can some off-the-shelf extractors. For example, there is a face parsing network provided by CelebAMask-HQ.

shenwuyue2022 commented 3 months ago

Hi, for our work, we use the multi-modal labels provided by datasets that are built on top of CelebA, for example CelebA-Dialog, CelebAMask-HQ, and Multi-Modal-CelebA-HQ. If you wish to extract new multi-modal labels, you can some off-the-shelf extractors. For example, there is a face parsing network provided by CelebAMask-HQ.

Hello! Regarding the mask and sketch parts of the CelebA dataset mentioned in this program, the mask part in the CelebA dataset consists of images, and in the program, they are converted into .pt format files, including the sketch part which is also converted into files. Could you please provide the specific conversion code for this part? Also, regarding the description of the folder under mask, combined with the final .pt shape being [19,1024], does 19 correspond to 19 categories, and does 1024 correspond to the downsampled 32*32?

ziqihuangg commented 3 months ago

Also, regarding the description of the folder under mask, combined with the final .pt shape being [19,1024], does 19 correspond to 19 categories, and does 1024 correspond to the downsampled 32*32?

Yes that's right.

shenwuyue2022 commented 3 months ago

Also, regarding the description of the folder under mask, combined with the final .pt shape being [19,1024], does 19 correspond to 19 categories, and does 1024 correspond to the downsampled 32*32?

Yes that's right.

Hello, here is my understanding of the process for converting mask image files into .pt files, please help me check if there are any issues. `def resize_and_convert_to_tensor(file_path, output_dir, num_classes=19): transform = transforms.Compose([ transforms.Resize((32, 32), interpolation=transforms.InterpolationMode.NEAREST), transforms.ToTensor() ])

image = Image.open(file_path).convert('L')
downsampled_tensor = transform(image)

downsampled_map = downsampled_tensor.view(-1).numpy()

one_hot_tensor = torch.zeros((num_classes, 1024), dtype=torch.float32)

for idx, pixel_value in enumerate(downsampled_map):
    class_index = int(pixel_value)
    if class_index < num_classes:
        one_hot_tensor[class_index, idx] = 1

base_name = os.path.splitext(os.path.basename(file_path))[0]
tensor_file_path = os.path.join(output_dir, f"{base_name}.pt")
torch.save(one_hot_tensor, tensor_file_path)

return tensor_file_path`