tolstikhin / adagan

AdaGAN: greedy iterative procedure to train mixtures of GANs
BSD 3-Clause "New" or "Revised" License
76 stars 18 forks source link

MNIST dataset load error #1

Open Mrxiaoyuer opened 7 years ago

Mrxiaoyuer commented 7 years ago

tensorflow version: 1.2.1 When I tried to run python adagan_mnist.py, it showed this error: 2017-07-07 19:46:49,618 - Loading MNIST Traceback (most recent call last): File "adagan_mnist.py", line 152, in main() File "adagan_mnist.py", line 113, in main data = DataHandler(opts) File "/home/xiaoyu/adagan/datahandler.py", line 62, in init self._load_data(opts) File "/home/xiaoyu/adagan/datahandler.py", line 69, in _load_data self._load_mnist(opts) File "/home/xiaoyu/adagan/datahandler.py", line 212, in _load_mnist tr_X = loaded[16:].reshape((60000, 28, 28, 1)).astype(np.float) ValueError: cannot reshape array of size 9912406 into shape (60000,28,28,1) The dataset is downloaded from the MNIST website. So does anyone have any idea about what's wrong?

tolstikhin commented 7 years ago

It is indeed very strange, as 9912406 is not divisible by 28, which is the resolution of MNIST digits. I double checked right now, downloaded the dataset, ran the script --- and it works fine for me. Let me know if you find any further clues!

Mrxiaoyuer commented 7 years ago

It seems there is something wrong with the mnist data files when I downloaded them. I tried to open it with bless(hex editor), and the file starts with 1F8B ----------- instead of 2051, which should be the correct beginning 32 bit. I tried to re-downloaded them and remove the .gz extension several times, it was still wrong. Maybe it's related to my browser(Firefox) and OS(ubuntu 16,04LTS)?

Well, it finally works. It should be unzipping the .gz data file instead of directly removing the .gz extension. And the file name should also be changed back. For example, all the unzipped file name are changed,like 't10k-images.idx3-ubyte', where the 'images.idx3' should be 'images-idx3'. After this, loading data should be okay. Thank you very much~

aethereus commented 6 years ago

I got exactly the same error in my experimental code with exactly the same number 9912406, so I guess this is not a coincidence.

I found that when I directly read the gzip files using gzip module, I always get this error. The problematic code is:

import gzip
from struct import unpack

def load_images(path):
    with gzip.open(path) as f:
        magic, images_count, rows, cols = unpack(">iiii", f.read(16))
        return np.fromfile(f, dtype='uint8').reshape(images_count, rows * cols)

However, when I manually gunzip the files, and use the following code:

from struct import unpack

def load_images(path):
    with open(path, 'rb') as f:
        magic, images_count, rows, cols = unpack(">iiii", f.read(16))
        return np.fromfile(f, dtype='uint8').reshape(images_count, rows * cols)

then all is fine.

I can't figure out why the gzip version doesn't work though.

beeksh commented 3 years ago

CODE import struct import numpy as np import matplotlib.pyplot as plt import os from struct import unpack

def load_data(): with open("train-labels-idx1-ubyte.gz","rb") as labels: magic,n=struct.unpack('>II',labels.read(8) train_labels=np.fromfile(labels,dtype=np.uint8) with open('train-images-idx3-ubyte.gz' ,"rb") as imgs: magic,num,nrows,ncols=struct.unpack('>IIII',imgs.read(16)) train_images = np.fromfile(imgs , dtype=np.uint8).reshape(num,784) with open("t10k-labels-idx1-ubyte.gz","rb") as labels: magic,n=struct.unpack('>II',labels.read(8))#big endian , 2 unsighed int, reading 8bytes# test_labels=np.fromfile(labels,dtype=np.uint8) with open('t10k-images-idx3-ubyte.gz' , 'rb') as imgs: magic,num,nrows,ncols=struct.unpack('>IIII',imgs.read(16)) test_images = np.fromfile(imgs , dtype=np.uint8).reshape(num,784) return train_images,train_labels,test_images,test_labels

def visualize_data(img_array , label_array): fig, ax = plt.subplots(nrows=8,ncols=8,sharex=True,sharey=True) ax = ax.flatten() for i in range(64): img = img_array[label_array==7][i].reshape(28,28) ax[i].imshow(img , cmap='Greys', interpolation='nearest') plt.show()

train_x , train_y , test_x , test_y = load_data()

visualize_data(train_x,train_y)

ERROR- line 31, in train_x , train_y , test_x , test_y = load_data() line 14, in load_data cannot reshape array of size 9912406 into shape (2055376946,784)

my mnist files r in .gz format itself , downloaded the latest from website , please help me out