celeba aligned and wild

ubc-vision / StableKeypoints

Apache License 2.0

81 stars 6 forks source link

celeba aligned and wild #10

Closed tian-2024 closed 1 month ago

tian-2024 commented 2 months ago

Hi, I just don't understand the directory structure of the celeba dataset.

I saw your mentioned script in

https://github.com/xingzhehe/GANSeg/tree/main/data/celeba_wild_raw/MAFL

Then I don't understand whether you use same python file for celeba_wild_preprocess.py for wild and aligned celeba data?

because I only saw "wild preprocess" python file.

ehedlin commented 2 months ago

The aligned images are cropped and centred acording to the bounding boxes (aligning and cropping is already performed and available on the celeba website) and in the wild are the subset of unaligned images whose faces take up more than 30% of the image. celeba_wild_preprocess.py finds the in the wild images by filtering with that 30% threshold. This same logic is handled in this line so preprocessing isnt necessary.

ehedlin commented 2 months ago

Its also worth noting that all aligned images are used so the size of that set will be larger than unaligned.

tian-2024 commented 1 month ago

The aligned images are cropped and centred acording to the bounding boxes (aligning and cropping is already performed and available on the celeba website) and in the wild are the subset of unaligned images whose faces take up more than 30% of the image. celeba_wild_preprocess.py finds the in the wild images by filtering with that 30% threshold. This same logic is handled in this line so preprocessing isnt necessary.

I read the code more deeply, so now I can understand some parts of what you said, thanks a lot!

so do you mean when using celeba datasets, I just need to unzip all data and put MAFL directory in CelebA, and I don't need other preprocessing?

like this, right?

tian-2024 commented 1 month ago

I also wonder some questions about the CUB dataset.

I downloaded the raw dataset, it's 1.1 GB.

1.1G 9月   3 19:01 CUB_200_2011.tgz

but when I used the preprocessing python file and get 'cub.h5', it's just 555MB.

555M 9月   3 19:34 cub.h5

so I wonder if it's the same as the raw dataset?

because I never use the 1.1GB dataset to get 'cub.h5'.

I just use the cachedir file.

it's just 27.6 MB

I don't undetstand this part.

so I guess the cub.h5 is a cleaned version of the raw data and we just don't need to use the raw data, right?

Thank you so much~

tian-2024 commented 1 month ago

I can run it now~