tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.27k stars 1.53k forks source link

[Feature Request/Bug] Add identity index or original file name to celeb_a attributes #4905

Closed sk1ddy closed 1 year ago

sk1ddy commented 1 year ago

Maybe I am missing something, so correct me if I am wrong, but I noticed that there is no identity attribute in the tfds 'celeb_a' dataset, despite it being mentioned in the dataset description. I quote from here: "CelebA has large diversities, large quantities, and rich annotations, including - 10,177 number of identities, - 202,599 number of face images, and - 5 landmark locations, 40 binary attributes annotations per image."

I do not know if this is intended or not, anyway I find all these tfds datasets very convenient, but I also need to use the identity label for my use case. Would it be possible to add an identity index to the current set of features? Another idea would be to directly add the original file name to the attributes (the identity can be inferred from that), as it was suggested, among other things, here .

pierrot0 commented 1 year ago

Hey, thanks for the request. At the moment it's unlikely that we will get any bandwidth to work on this though. However if you wish to send a PR, we will look at it.

From a quick look at it, one would need to download the identities file from around https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/datasets/celeb_a/celeb_a_dataset_builder.py#L93, then read that file and do the stitching around https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/datasets/celeb_a/celeb_a_dataset_builder.py#L180. Then the test and checksums file would need to be updated.

OmkarBorhade98 commented 1 year ago

Hi, @sk1ddy @pierrot0 I was trying to work on this issue. I was not able to load dataset since it was not able to bypass the Google Drive warning: Google Drive can't scan this file for viruses. img_align_celeba.zip (1.3G) is too large for Google to scan for viruses.

I have raised a issue for the same: #4924

OmkarBorhade98 commented 1 year ago

Hi, @pierrot0 I have submitted a PR #4928 which will fix this issue.

ccl-core commented 1 year ago

Thank you @OmkarBorhade98 for your contribution! PR #4928 is merged.

andergalvaan commented 1 year ago

Hello everyone,

Is there any reference code in which celeb_a dataset is used to deploy a face recognition system?

Best regards, Ander