[data request] IBM Diversity in Faces Dataset (DiF)

dynamicwebpaige commented 5 years ago

Name of dataset: IBM Diversity in Faces (DiF)
URL of dataset: https://www.research.ibm.com/artificial-intelligence/trusted-ai/diversity-in-faces/
License of dataset: Terms of Use
Short description of dataset and use case(s): "The Diversity in Faces (DiF) is a large and diverse dataset that seeks to advance the study of fairness and accuracy in facial recognition technology. The first of its kind available to the global research community,DiF provides a dataset of annotations of 1 million human facial images."

Folks who would also like to see this dataset in tensorflow/datasets, please thumbs-up so the developers can know which requests to prioritize.

ChanchalKumarMaji commented 5 years ago

@cyfra @dynamicwebpaige @rsepassi @Conchylicultor , please assign this issue to me.

Oktai15 commented 5 years ago

@dynamicwebpaige hi! where can I find this dataset? Link is not related to dataset, actually...

yiminglin-ai commented 5 years ago

@dynamicwebpaige hi! where can I find this dataset? Link is not related to dataset, actually...

I could not find the dataset using the link either.@dynamicwebpaige

magomar commented 5 years ago

No news on this dataset? what happened? I'm very interested on it too

timlueg commented 5 years ago

It's only available upon request.

Correct (?) URL: https://research.ibm.com/artificial-intelligence/trusted-ai/diversity-in-faces/

jpgard commented 4 years ago

Has anyone had success accessing this dataset by request? I have not heard back from a request for the data after over 6 weeks, despite following the protocol listed on the dataset website.

Merlin2013 commented 4 years ago

Has anyone had success accessing this dataset by request? I have not heard back from a request for the data after over 6 weeks, despite following the protocol listed on the dataset website.

I got the same situation,have you received any infomation?

jpgard commented 4 years ago

No, the official contact listed in the paper was unresponsive to my requests, as were the authors of the paper. Sad to see that the purported main contribution of their work -- the dataset -- is actually not available, as far as I can tell.

Merlin2013 commented 4 years ago

No, the official contact listed in the paper was unresponsive to my requests, as were the authors of the paper. Sad to see that the purported main contribution of their work -- the dataset -- is actually not available, as far as I can tell.

the official diversity-in-faces program webpage has gone, it seems that they donot want to share it any more under the pressure of public opinion raising by media. I am looking for someone who has the dataset and begging for a copy, but no luck at all for now.

magomar commented 4 years ago

Tried to get this without success. This is very unfortunate. IBM got so much publicity from this announcement and then they retired access to the dataset without any explanation or clarification. They still keep the announcement online as if this dataset was still availabl , which is not the case. This is not right.

Aspie96 commented 4 years ago

This would be absolutely great. I agree it's a shame how private IBM is keeping this dataset

Aspie96 commented 4 years ago

the official diversity-in-faces program webpage has gone, it seems that they donot want to share it any more under the pressure of public opinion raising by media.

It'd be nice if the media stopped ruining everything just one day.

yiiizhang commented 4 years ago

It's only available upon request.

Correct (?) URL: https://research.ibm.com/artificial-intelligence/trusted-ai/diversity-in-faces/ The official diversity-in-faces program webpage has gone

Aspie96 commented 4 years ago

They don't actually provide it it seems, which is a shame. Somebody should rebuild this dataset IMO. It wouldn't even be too hard to build a similar one.

jpgard commented 4 years ago

That's correct, the dataset is not even available upon request (or has not been for around a year) -- unless someone can verify that they have received the dataset from IBM during that time. The same issue also exists with some other "diverse" facial datasets, such as Gender Shades PPB dataset afaik.

Rebuilding the dataset would be a difficult, but worthy task, and the procedure is fairly well-described in the paper...it is also strange that they haven't explicitly revoked the dataset or asked researchers to delete it, they just "disappeared" it, which makes the status of an effort like this (to share it with the entire tf community) a bit ambiguous.

Aspie96 commented 4 years ago

Rebuilding the dataset would be a difficult, but worthy task, and the procedure is fairly well-described in the paper...

Yes, it'd be hard to rebuild it. But with shared effort building a similar one isn't that hard, depending on the purpose.

I didn't check which features the original dataset had, so I can be wrong, but here's what I am thinking.

The original dataset it comes from might still be available. If it is, one can use the same dataset to extract all faces. Original images need not to be saved, only obtained and than only faces must be retained. This can be done as a shared effort, while each client analizes only a subset of the dataset, like a botnet. Then, by each client, some features (such as spots in the face) can be easily obtained automatically. In fact, they have to if face alignment is required. In addition, gender could be detected fairly safely by using multiple models. I suggest when the models disagree on gender, it should be assumed it cannot be detected, thus those images, and only those, must be manually labelled. Then, an additional model can be trained on those faces only (they are the manually labelled faces of this particular dataset, so they are the most reliable) and all other faces can be classified once more. If the new annotation disagree with the old one, these faces also have to be annotated manually. By doing so, one can collect a huge dataset of faces which are aligned, have the original source as this dataset and are labelled by gender.

However, this effort is useful if and only if access is no longer controlled by a central authority. This has proven not to be beneficial. For this reason, those building the dataset should waive all of their right on the dataset (if they have any at all) and make it available free for everybody to download. Storage should not be a huge issue for that.

ItamarRocha commented 4 years ago

That's really unfortunate. I was seeking for a copy, but I've seen it wasn't even released. What a shame,

atg-abhishek commented 3 years ago

Anyone know of other instances where datasets have "disappeared" without a trace?

Electronicshelf commented 3 years ago

IBM wake up, where is the dataset?

Aspie96 commented 3 years ago

We shouldn't trust datasets that have a gatekeeper, EVER.

If there is a gatekeeper, it is not science.

magomar commented 3 years ago

Some info can be found here: https://www.nbcnews.com/tech/internet/facial-recognition-s-dirty-little-secret-millions-online-photos-scraped-n981921

I'm very dissapointed at how IBM handled this, they got all the merit for publishing it, and then hide the fact that the dataset was not available anymore, and this happened probably just a few weeks after being announced. The dataset is still being publicized by IBM sites, unbelievable !

atg-abhishek commented 3 years ago

oh @magomar do you have links for some of the pages that still advertise it?

magomar commented 3 years ago

The IBM Research Blog is a goog example: https://www.ibm.com/blogs/research/2019/01/diversity-in-faces/

magomar commented 3 years ago

Anyone know of other instances where datasets have "disappeared" without a trace?

You may find more info on other datasets here: https://megapixels.cc/

Actually, the people behind that site claim to have caused some of biometric datasets being terminated or deactivated

Aspie96 commented 3 years ago

It is really bad this happens.

jessieyaros commented 3 years ago

For anyone still curious, based on the public audits of facial recognition tech, IBM officially stopped all facial recognition research in Jun 2020, which I'm guessing included providing public access to this dataset. See: https://www.theverge.com/2020/6/8/21284683/ibm-no-longer-general-purpose-facial-recognition-analysis-software

atg-abhishek commented 3 years ago

Indeed @jessieyaros and I think it makes sense to pull away datasets that have issues leading to ethical challenges. I think the problem is that there are a lot of papers that have been published based on these datasets where there isn't a way to propagate this notification back to let readers know that the findings from those papers should be taken with a (huge) grain of salt because of disputes and challenges to the underlying datasets.

tensorflow / datasets

[data request] IBM Diversity in Faces Dataset (DiF) #299