raywzy / ICT

High-Fidelity Pluralistic Image Completion with Transformers (ICCV 2021)
309 stars 43 forks source link

Will your kmeans data work well on other domains? #29

Open naoki7090624 opened 2 years ago

naoki7090624 commented 2 years ago

Hi, Thanks for sharing great work! I have a question about kmeans_centers.npy. According to your paper, you use clustering data generated from ImageNet to reduce the computational cost.

To further reduce the dimension and faithfully re-represent the low-resolution image, 
an extra visual vocabulary with spatial size 512 × 3 is generated using KMeans cluster centers of the whole ImageNet [8] RGB pixel spaces. 

Will your clustering data work well for other domains (like faces, paintings or maps)?

raywzy commented 2 years ago

Great question. The anwser is yes, in my opinion. Actually, such image representation was very popular on the early-stage computers, which only owns limited bits to store the color information. The quantized image sometims also has the colour banding effects.

As well, I suggest to re-cluster these base colors to seek better performance while you apply them into other domains.

naoki7090624 commented 2 years ago

Thank you for quick response. I would try it on paintings or comics, but I think it would require reclustering for better performance. Could you share your clustering code that produces the kmeans_centers.npy?

jackhu-bme commented 2 years ago

Well, in my project of inpainting for medical images, reclustering is essential for greyscale medical images. Maybe code sharing is not essential, but the meaning of each element in kmeans_centers.npy is needed. After I print the loading result from numpy, I see that there are some negative values in this cluster result. However, from the sentence "using KMeans cluster centers of the whole ImageNet [8] RGB pixel spaces. ", I could not figure out if each column of three elements mean RGB color, why it is not in the range of [0, 255] or in the range of [0, 1]. Is it come from the value range of [0, 255] of each channel and then normalized to [-1, 1]? Maye I need some hints~ Greatly thankful for your help!

jackhu-bme commented 2 years ago

I have rechecked the code and found the key to my question. Acually, the normalization method does not count too much. The code here define a correct relationship between the color of each pixel and the nearest cluster center. As there are 512 cluster centers, the color will be compressed as the value of the index of cluster center. For example, if the img size is 3232, the a[:] in the dataset will be a (1024,) shape tensor and each element shows the color of one corresponding pixel. `C = np.load('kmeans_centers.npy') ## [0,1] C = np.rint(127.5 (C + 1.0)) C = torch.from_numpy(C)` from the code in the post preocessing in inference.py under transformers dir, we can see the conversion from [-1, 1] range np array to [0, 255] color So if you don't want to change this conversion method, you can just reverse this prcoess to have correct cluster npy file. You can also change the rules of conversion by change the code here and the rule of generating npy arrays. I guess this issue can be closed now. Clustering methods can be easily accessed in the Insternet. Just be careful about the rule of generating npy array.

jackhu-bme commented 2 years ago

The kmeans npy is not in the range of [0, 1] as the annotation when I load it, please check whether there is mistake here. From the conversion rule, if C=-1, it's 0 after conversion, and the result is correct in the range of [0, 255] if the npy array is in the range of [-1, 1].