maxpumperla / deep_learning_and_the_game_of_go

Code and other material for the book "Deep Learning and the Game of Go"
https://www.manning.com/books/deep-learning-and-the-game-of-go
953 stars 387 forks source link

predictions and evaluations in average_digits.py are off #45

Closed fbeshears closed 4 years ago

fbeshears commented 4 years ago

Max and Kevin,

I've added print statements to check out the predictions and evaluations in:

dlgo/nn/average_digits.py

The print results I'm getting don't match up with what the comments in the code say one should get.

The prediction for 8 is close. But the prediction for 4 is off. It should be close to zero but it's 1.0.

The evaluations are all off.

The print results for the dot product calculations for x_3 and x_18 are also way off.

Best, Fred Beshears

# tag::avg_imports[]
import numpy as np
from dlgo.nn.load_mnist import load_data
from dlgo.nn.layers import sigmoid_double

# end::avg_imports[]

# tag::average_digit[]
def average_digit(data, digit):  # <1>
    filtered_data = [x[0] for x in data if np.argmax(x[1]) == digit]
    filtered_array = np.asarray(filtered_data)
    return np.average(filtered_array, axis=0)

train, test = load_data()
avg_eight = average_digit(train, 8)  # <2>

# <1> We compute the average over all samples in our data representing a given digit.
# <2> We use the average eight as parameters for a simple model to detect eights.
# end::average_digit[]

# tag::display_digit[]
from matplotlib import pyplot as plt

img = (np.reshape(avg_eight, (28, 28)))
plt.imshow(img)
plt.show()
# end::display_digit[]

# tag::eval_eight[]
x_3 = train[2][0]    # <1>
x_18 = train[17][0]  # <2>

W = np.transpose(avg_eight)

print("dot product for 4: %f" % np.dot(W, x_3))   # <3>
# dot product for 4: 1306671.457870

print("dot product for 8: %f" % np.dot(W, x_18))  # <4>
# dot product for 8: 3545954.035379

# <1> Training sample at index 2 is a "4".
# <2> Training sample at index 17 is an "8"
# <3> This evaluates to about 20.1.
# <4> This term is much bigger, about 54.2.
# end::eval_eight[]

# tag::predict_simple[]
def predict(x, W, b):  # <1>
    return sigmoid_double(np.dot(W, x) + b)

b = -45  # <2>

print("prediction for 4: %f" % predict(x_3, W, b))   # <3>
# prediction for 4: 1.000000

print("prediction for 8: %f" % predict(x_18, W, b))  # <4> 0.96000000
# prediction for 8: 1.000000

# <1> A simple prediction is defined by applying sigmoid to the output of np.doc(W, x) + b.
# <2> Based on the examples computed so far we set the bias term to -45.
# <3> The prediction for the example with a "4" is close to zero.
# <5> The prediction for an "8" is 0.96 here. We seem to be onto something with our heuristic.
# end::predict_simple[]

# tag::evaluate_simple[]
def evaluate(data, digit, threshold, W, b):  # <1>
    total_samples = 1.0 * len(data)
    correct_predictions = 0
    for x in data:
        if predict(x[0], W, b) > threshold and np.argmax(x[1]) == digit:  # <2>
            correct_predictions += 1
        if predict(x[0], W, b) <= threshold and np.argmax(x[1]) != digit:  # <3>
            correct_predictions += 1
    return correct_predictions / total_samples

# <1> As evaluation metric we choose accuracy, the ratio of correct predictions among all.
# <2> Predicting an instance of an eight as "8" is a correct prediction.
# <3> If the prediction is below our threshold and the sample is not an "8", we also predicted correctly.
# end::evaluate_simple[]

# tag::evaluate_example[]
accuracy = evaluate(data=train, digit=8, threshold=0.5, W=W, b=b)  # <1>
print("Accuracy on training data: %f" % accuracy)
# Accuracy on training data: 0.097517

accuracy = evaluate(data=test, digit=8, threshold=0.5, W=W, b=b)   # <2>
print("Accuracy on test data: %f" % accuracy)
# Accuracy on test data: 0.097400

eight_test = [x for x in test if np.argmax(x[1]) == 8]
accuracy = evaluate(data=eight_test, digit=8, threshold=0.5, W=W, b=b)  # <3>
print("Accuracy on set of eights: %f" % accuracy)
# Accuracy on set of eights: 1.000000

# <1> Accuracy on training data of our simple model is 78% (0.7814)
# <2> Accuracy on test data is slightly lower, at 77% (0.7749)
# <3> Evaluating only on the set of eights in the test set only results in 67% accuracy (0.6663)
# end::evaluate_example[]
maxpumperla commented 4 years ago

@fbeshears thanks for your feedback. since the book has been published, we accepted a new PR that loads MNIST differently. Quite possibly this new data set comes in a different ordering. I'm just guessing for now and would have to check this myself, but I'm 99.9% sure the numbers were correct for the version we published with the book.

what do you get using the code from the chapter_5 branch?

maxpumperla commented 4 years ago

and, of course, apologies for the inconvenience. this is obviously not intended

maxpumperla commented 4 years ago

yes, python2 vs python3 incompatibilities were part of the reason we updated the data loading to something else. As I said, I need to check this first. Not sure what causes this, but obviously you can't reproduce the results I got.

maxpumperla commented 4 years ago

ok, so just to confirm. If I checkout 7abf582 which was the last commit before merging the mnist.npz fix, using python 2.7 I do get the numbers as presented in the book. With python 3.x I also get the same error as you do above (of course).

Now the question is, how do we keep the consistent data loading (I really do think it's better), while not accidentally messing with avg_digits.py? Thanks for noticing by the way. We only made sure the network still trains, not that the utility scripts produce the same output. We probably should have used doctests for that, but you know, there's only so much time in a day.

If you have a suggestion, I'd be happy to look into it. Meanwhile, I'll try to figure out what's stopping this from working.

fbeshears commented 4 years ago

Thanks for looking at this.

It's not a problem for me, btw. I'm retired, so I have plenty of time, and I enjoy tracking down glitches.

No suggestions for you. But, here's what I'm up to this morning.

I've been able to replicate your results by going back to using the following: python 2.7.15 numpy 1.16.5 six 1.12.0

Before I was using: python 3.7.4 numpy 1.17.2 six 1.12.0

With python2.7 etc. I'm able to load MNIST data as before with pickle,

So, now I might try storing it with numpy while using python2.7.15

Then it should be in a format that can be loaded with np.load() while using 3.7.4.

Then I'll be in a better position to know if the data set has changed for some reason.

fbeshears commented 4 years ago

Using python2.7, I was able to both load the MNIST data as before with pickle.

Also, I was able to save that data to an npz file with numpy with python2.7.

That npz file can then be re-opened with python3.7.4.

This new data file gives the same results when used with average_digits.

So, as far as I'm concerned, this closes the issue.

BTW: I think the book's great! So far, I've had the time to make it through Ch6.

maxpumperla commented 4 years ago

@fbeshears perfect, thanks for the feedback

fbeshears commented 4 years ago

Here's the code one needs to convert the MNIST data so it can be loaded with python3.x


#make_mnist_npz_with_py2_7.py

# This code must be run with python2.7

import set_path  #<1>
from dlgo.load_mnist import DLGO_DATA_DIR #<2>

#<1> sets the pyton search path so the dlgo cod directory can be found
#<2) load_mnist has the directory path where the dlgo data can be found
#--------------------------------

import numpy as np
import six.moves.cPickle as pickle
import gzip

def print_shapes(train_data, test_data):
    x_train, y_train = train_data
    x_test, y_test = test_data
    print(x_train.shape)
    print(x_test.shape)

def display_file_data_shape(fname):
    print("displaying shape of data in %s" % fname)

    train_data, test_data = load_data_impl(fname)
    print_shapes(train_data, test_data)

def make_quadratic(x_data):
    features = [np.reshape(x, (28, 28)) for x in x_data] 
    return np.array(features)

def load_data_impl(path):
    with np.load(path) as f:
        x_train, y_train = f['x_train'], f['y_train']
        x_test, y_test = f['x_test'], f['y_test']

        x_train = x_train.astype('float32')
        x_test = x_test.astype('float32')

    return (x_train, y_train), (x_test, y_test)

def reshape_28_28(train_data, test_data):

    x_train, y_train = train_data
    x_test, y_test = test_data

    x_train = make_quadratic(x_train)
    x_test = make_quadratic(x_test)

    return (x_train, y_train), (x_test, y_test)

def np_save_to_file(train_data, test_data, fname):

    x_train, y_train = train_data
    x_test, y_test = test_data

    print("======================")
    print("saving %s with np.savez()" % fname)

    np.savez(fname, 
        x_train=x_train, y_train=y_train, 
        x_test=x_test, y_test=y_test)

def gz_to_npz(in_file, out_file):
    # must run with python2.7

    print("opening and loading %s with gzip and pickle" % in_file)
    with gzip.open(in_file, 'rb') as f:
        train_data, validation_data, test_data = pickle.load(f)  # <4>

    print("shape of train_data and test_data from %s" % in_file)
    print_shapes(train_data, test_data)

    train_data, test_data = reshape_28_28(train_data, test_data)

    np_save_to_file(train_data, test_data, out_file)
    display_file_data_shape(out_file)

def main():
    in_file = DLGO_DATA_DIR + '\\mnist.pkl.gz'
    out_file = 'mnist_28_28.npz'

    gz_to_npz(in_file, out_file)

if __name__ == '__main__':
    main()