Open Mrxiaoyuer opened 7 years ago
It is indeed very strange, as 9912406 is not divisible by 28, which is the resolution of MNIST digits. I double checked right now, downloaded the dataset, ran the script --- and it works fine for me. Let me know if you find any further clues!
It seems there is something wrong with the mnist data files when I downloaded them. I tried to open it with bless(hex editor), and the file starts with 1F8B ----------- instead of 2051, which should be the correct beginning 32 bit. I tried to re-downloaded them and remove the .gz extension several times, it was still wrong. Maybe it's related to my browser(Firefox) and OS(ubuntu 16,04LTS)?
Well, it finally works. It should be unzipping the .gz data file instead of directly removing the .gz extension. And the file name should also be changed back. For example, all the unzipped file name are changed,like 't10k-images.idx3-ubyte', where the 'images.idx3' should be 'images-idx3'. After this, loading data should be okay. Thank you very much~
I got exactly the same error in my experimental code with exactly the same number 9912406, so I guess this is not a coincidence.
I found that when I directly read the gzip files using gzip
module, I always get this error. The problematic code is:
import gzip
from struct import unpack
def load_images(path):
with gzip.open(path) as f:
magic, images_count, rows, cols = unpack(">iiii", f.read(16))
return np.fromfile(f, dtype='uint8').reshape(images_count, rows * cols)
However, when I manually gunzip
the files, and use the following code:
from struct import unpack
def load_images(path):
with open(path, 'rb') as f:
magic, images_count, rows, cols = unpack(">iiii", f.read(16))
return np.fromfile(f, dtype='uint8').reshape(images_count, rows * cols)
then all is fine.
I can't figure out why the gzip version doesn't work though.
CODE import struct import numpy as np import matplotlib.pyplot as plt import os from struct import unpack
def load_data(): with open("train-labels-idx1-ubyte.gz","rb") as labels: magic,n=struct.unpack('>II',labels.read(8) train_labels=np.fromfile(labels,dtype=np.uint8) with open('train-images-idx3-ubyte.gz' ,"rb") as imgs: magic,num,nrows,ncols=struct.unpack('>IIII',imgs.read(16)) train_images = np.fromfile(imgs , dtype=np.uint8).reshape(num,784) with open("t10k-labels-idx1-ubyte.gz","rb") as labels: magic,n=struct.unpack('>II',labels.read(8))#big endian , 2 unsighed int, reading 8bytes# test_labels=np.fromfile(labels,dtype=np.uint8) with open('t10k-images-idx3-ubyte.gz' , 'rb') as imgs: magic,num,nrows,ncols=struct.unpack('>IIII',imgs.read(16)) test_images = np.fromfile(imgs , dtype=np.uint8).reshape(num,784) return train_images,train_labels,test_images,test_labels
def visualize_data(img_array , label_array): fig, ax = plt.subplots(nrows=8,ncols=8,sharex=True,sharey=True) ax = ax.flatten() for i in range(64): img = img_array[label_array==7][i].reshape(28,28) ax[i].imshow(img , cmap='Greys', interpolation='nearest') plt.show()
train_x , train_y , test_x , test_y = load_data()
my mnist files r in .gz format itself , downloaded the latest from website , please help me out
tensorflow version: 1.2.1 When I tried to run python adagan_mnist.py, it showed this error: 2017-07-07 19:46:49,618 - Loading MNIST Traceback (most recent call last): File "adagan_mnist.py", line 152, in
main()
File "adagan_mnist.py", line 113, in main
data = DataHandler(opts)
File "/home/xiaoyu/adagan/datahandler.py", line 62, in init
self._load_data(opts)
File "/home/xiaoyu/adagan/datahandler.py", line 69, in _load_data
self._load_mnist(opts)
File "/home/xiaoyu/adagan/datahandler.py", line 212, in _load_mnist
tr_X = loaded[16:].reshape((60000, 28, 28, 1)).astype(np.float)
ValueError: cannot reshape array of size 9912406 into shape (60000,28,28,1)
The dataset is downloaded from the MNIST website. So does anyone have any idea about what's wrong?