the problem of download training data

feibin95 commented 5 years ago

First, download the training data from the website (http://www.msceleb.org/download/lowshot)

The data set has been deleted from the official website. Could you please provide a download link? Thank you very much！

wuyuebupt commented 5 years ago

@feifei9099 I am afraid that I can not provide a download link for such large dataset. You may find some alternative download links from: https://github.com/deepinsight/insightface/wiki/Dataset-Zoo https://ibug.doc.ic.ac.uk/resources/lightweight-face-recognition-challenge-workshop/

I also would like to share some news about the dataset: https://www.vice.com/en_us/article/a3x4mp/microsoft-deleted-a-facial-recognition-database-but-its-not-dead https://megapixels.cc/datasets/msceleb/

Hope this helps.

feibin95 commented 5 years ago

Thank you very much for the news you provide, I found a download link here: https://academictorrents.com/details/9e67eb7cc23c9417f39778a8e06cca5e26196a97/tech

But I still have a problem if I want to study the Low-Shot part of ms-celeb-1mrefer to your paper “Low-shot Face Recognition with Hybrid Classifiers, ICCV Workshop, 2017”. Should I download all the data sets, and all the data sets contain low-shot parts? In your paper you introduced the low shot part include “Base Set consists of 20,000 people, with an average of 58 training samples per person. Novel Set has the rest 1,000 people, of which each comes with 1, 2 or 5 training images.” Thank you very much！

wuyuebupt commented 5 years ago

@feifei9099

The data split is provided by the challenge organizer. The first paper that states the setting should be "One-shot Face Recognition by Promoting Underrepresented Classes" (https://arxiv.org/pdf/1707.05574.pdf).

The training data are a part of the full dataset. I am not sure if you can split it out directly from the whole data as the low shot data is a cleaned version.

I found a list of the training ids. 0-19999 is the base set, 20000-20999 is the novel set. The link is at: https://drive.google.com/file/d/14n5f6ZfmxP20j3iDGRiDSKGVybs2Sy8y/view?usp=sharing

I can not find the training data originally in tsv format. I do have a copy of image data, which is about 24GB. Even I find the original training file, it is still too large for me to upload.

I do find two parts of data.

training data for novel set: https://drive.google.com/file/d/18g4Cn7uSWxLM1IHxVMHbC-eI60juuXDn/view?usp=sharing
validation set that contains 25000 images, https://drive.google.com/file/d/1R0yky3CT6Uuvu6z2KQxggxsRV9XrrLRs/view?usp=sharing

One list for all training images for base set: https://drive.google.com/file/d/1by9zWY2xcocYdne8_sGKILnCARXLJsB7/view?usp=sharing

If the number of training images matches, they should be the same.

Hope these help.

feibin95 commented 5 years ago

Thank you very much for your help. It means a lot to me.

wtongping commented 4 years ago

@feifei9099

The data split is provided by the challenge organizer. The first paper that states the setting should be "One-shot Face Recognition by Promoting Underrepresented Classes" (https://arxiv.org/pdf/1707.05574.pdf).

The training data are a part of the full dataset. I am not sure if you can split it out directly from the whole data as the low shot data is a cleaned version.

I found a list of the training ids. 0-19999 is the base set, 20000-20999 is the novel set. The link is at: https://drive.google.com/file/d/14n5f6ZfmxP20j3iDGRiDSKGVybs2Sy8y/view?usp=sharing

I can not find the training data originally in tsv format. I do have a copy of image data, which is about 24GB. Even I find the original training file, it is still too large for me to upload.

I do find two parts of data.

training data for novel set: https://drive.google.com/file/d/18g4Cn7uSWxLM1IHxVMHbC-eI60juuXDn/view?usp=sharing

validation set that contains 25000 images, https://drive.google.com/file/d/1R0yky3CT6Uuvu6z2KQxggxsRV9XrrLRs/view?usp=sharing

One list for all training images for base set: https://drive.google.com/file/d/1by9zWY2xcocYdne8_sGKILnCARXLJsB7/view?usp=sharing

If the number of training images matches, they should be the same.

Hope these help.

@wuyuebupt
Thank you for providing these links. I have been able to sort out the training data, but I have been searching on the network for a long time, but I still can't find the test data. Could you please share the list(image name) of test data of base and novel？

wuyuebupt commented 4 years ago

@wtongping

I did not find the test data. Even I found the test data, the labels were not available as the challenge evaluation was run by the organizers if I remember correctly.

wtongping commented 4 years ago

@wuyuebupt ok, thank u!

wuyuebupt / hybridClassifier

the problem of download training data #2