zalandoresearch / fashion-mnist

A MNIST-like fashion product database. Benchmark :point_down:
http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/
MIT License
11.69k stars 2.98k forks source link

Benchmark (FullConnection + IncrementalLearning) with Test Acc. 98.77%! #136

Open moonblue333 opened 5 years ago

moonblue333 commented 5 years ago

Please refer to: https://github.com/yuenuting incremental-learning-world-record-mnist-fashion-v1 (as title, and the portal of these projects. ) incremental-learning-world-record-mnist-fashion-v2 (95.41 ~ 97.0) incremental-learning-world-record-mnist-fashion-v3 (96.17 ~ 97.5) incremental-learning-world-record-mnist-fashion-v4 (98.*) incremental-learning-world-record-mnist-fashion-v (open or not, under discussion ...)

We deleted ideas discussions here, especial some sharp criticisms to doday's AI, such as BP, CNN, total direction, etc. Main ideas please refer to our project README, and directly check code.

Ideas contributor: yuenuting & moonblue333 Algorithm developer: moonblue333

kashif commented 5 years ago

@moonblue333 98.77 test set accuracy on fashion-mnist with a network of a thousand or so parameters does not seem correct. kindly double check a few things:

hanxiao commented 5 years ago

One common mistake is that people thought they are working with Fashion-MNIST, but they are actually working with MNIST. This mistake can be found repeatedly in #110 #47 #36 #119 #129 etc.

We have submited source (not including core algorithm, but you can see that everything we tested is correct.)

Unfortunately, your code doesn't contain any validation to ensure the data is Fashion-MNIST. By default, input_data.read_data_sets will download or fall back quietly to standard MNIST so people can easily miss it. Unless you do:

data = input_data.read_data_sets('data/fashion', source_url='http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/')

Or you download the data by yourself and put it there in advance.

I'm sorry to say that, but it's highly likely due to some bug in your code. Please check your code again and again. I'm closing this issue as it is invalid. Please feel free to reopen it if you pass the following tests and show some evidences that you are working with Fashion-MNIST.

Here are some ways to check whether you are working with the correct dataset.

  1. Check the MD5 of your dataset. And compare it with our MD5 in the README.md. Details can be found in this thread;
  2. Plot your dataset, as suggested here;
  3. Refer what I said in #110 #47 #36 #119 #129 about the same mistake.

Our benchmark table (contributed by different researchers) also suggests that it's impossible for simple CNN to get 99% test accuracy. FC should be even worse, due to the lack of spatial representation power.

Good luck!

hanxiao commented 5 years ago

I'm reopening this issue as @moonblue333 and @yuenuting argue that they are using the right dataset. I also mark this issue as "help wanted" and welcome the community to validate their idea.

There is no point to be offended. Tolerating others' questioning is a part of science and should not be taken personally by all means. As the core maintainer of this repo, the least I can do is to argue people's submissions and fences the wrong ones from the benchmark table.

hanxiao commented 5 years ago

Again, @moonblue333 please do not think questioning your result is out to deliberately hurt you. There is no such drama or discrimination thing as you pictured.

I see you open source the code, good.

You don't need to explain it line by line to me. Please understand your result outperforms all others in the benchmark table in a large margin. With such breakthrough (you mentioned yourself in the repo it's a world record), you have to wait more people (not just me) to reproduce the result and validate your idea.

Therefore, I already marked this issue as "help-wanted" and "contribution-welcome", we just need to wait the community to reproduce the result. After all, this open-source project that relies on the community. Don't worry, people will find it soon.

I've changed the issue title to raise more attention from the community.

To all, the repo that issuer mentioned is https://github.com/yuenuting/incremental-learning-world-record-mnist-fashion

kashif commented 5 years ago

@hanxiao the code is not open sourced as far as i can see... the only thing that is open is the model runner... Please close this issue.

@moonblue333 @yuenuting I do not have time due to my own TODOs to run the code that does not exist as far as I can see. Kindly open a new pull request when you are ready to share your findings to the public since there is no point in putting some number in some table as untested as you mention... all the other benchmarks are linked to verifiable code, which is what we will need from you as well. Also please keep the discussion here to the issue at hand rather than essays justifying the numbers you claim as you are not helping yourself.

hanxiao commented 5 years ago

hi @moonblue333 @yuenuting , it looks like you are using ROC and AUC as the evaluation metrics.

https://github.com/yuenuting/incremental-learning-world-record-mnist-fashion/blob/4a9a1b7992b9209f5e4a2f91cf246c8e7c58746a/tpj.py#L627-L634

Note that our benchmark table is based on the mean accuracy. To make a fair comparison with other submissions, please change your get_performance to the following and report it again:

def get_performance(self, p, y):  
      p = np.argmax(p, axis=1)
      y = np.argmax(y, axis=1)  
      return np.mean(p == y)
moonblue333 commented 5 years ago

OK. We tidied mean_accuracy [MEAN_ACCURACY] version, and the final accuracy(10-classes) is 98.*%.
We are tidying and will submit it again ASAP.

def get_performance(self, p, y):
    print("$$$$$$$$$ get_performance()")
    print('p=',p.shape)
    print('y=',y.shape)
    print('len(p)=',len(p))
    ok = 0
    no = 0
    for i in range(len(p)):
        if np.argmax(p[i]) == np.argmax(y[i]):
            ok = ok + 1
        else:
            no = no + 1
            print('p0=',p[i])
            print('y0=',y[i])
            print('p0-c=',np.argmax(p[i]))
            print('y0-c=',np.argmax(y[i]))
            print('correct=',(np.argmax(p[i]) == np.argmax(y[i])))
            print()
    print('ok=',ok)
    print('no=',no)
    print('acc=',(ok/(ok+no)))
    p = np.argmax(p, axis=1)
    y = np.argmax(y, axis=1)  
    return np.mean(p == y)
yuenuting commented 5 years ago

submited one version(95.41 ~ 97.0), others will be submited asap.

yuenuting commented 5 years ago

submited another version (96.17 ~ 97.5), others will be submited asap.

  1. more tricks
  2. data aug (classes detail in project)
  3. hyper-parameters

@kashif @hanxiao

yuenuting commented 5 years ago

new better version'SSSSSSSSS' are tidying, on the way, we v them AAA-SSS-AAA-PPP.

2019-01-06: tidied completed, final_voted. (incremental + ensemble + net2net + ... & full-connection small network + few cnn big network)

2019-01-07: we submited the version-4. (maybe this is a final version, if we will submit more powerful version, we will submit the total AGI project, because copying and tidying code from a bigger project is a time-consuming thing.

yuenuting commented 5 years ago

@kashif @hanxiao we submitted v4, and detial descriptions. (some mini sub-networks use FC-NN, that is OK.)