Closed novel-yet-trivial closed 6 years ago
Thanks for the note, but I think the current spelling of the file is correct. For instance, if you go to the original resource for this dataset, http://yann.lecun.com/exdb/mnist/, you see that the files are spelled as follows:
train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)
Then, when I unzip the files, the file names are still the same but with the ".gz" suffix removed, I.e.,
train-images-idx3-ubyte
train-labels-idx1-ubyte
t10k-images-idx3-ubyte
t10k-labels-idx1-ubyte
Did you maybe manually relabel the files by accident?
Hmmm. When I decompress with the gzip command line utility as per your instructions, then the file names are correct. However if I decompress with the Gnome Archive Manager (GUI), the file names have a period in them.
I suggest you forget the decompression step and just directly access the compressed files from python, thereby circumventing the filename problem:
import gzip
#...
def load_mnist(path, kind='train'):
labels_path = os.path.join(path,
'%s-labels-idx1-ubyte.gz' % kind)
images_path = os.path.join(path,
'%s-images-idx3-ubyte.gz' % kind)
with gzip.open(labels_path, 'rb') as lbpath:
magic, n = struct.unpack('>II',
lbpath.read(8))
That's good to know; wouldn't have thought that certain tools might be renaming the files. Regarding the code example above, it doesn't work (I think I experimented a lot with loading it directly via gzip back then, but I couldn't get it to work). Will do some more experiments and upload a fixed version -- I like that idea, thanks!
Turns out the following does the trick
import os
import struct
import numpy as np
import gzip
def load_mnist(path, kind='train'):
"""Load MNIST data from `path`"""
labels_path = os.path.join(path,
'%s-labels-idx1-ubyte.gz' % kind)
images_path = os.path.join(path,
'%s-images-idx3-ubyte.gz' % kind)
with gzip.open(labels_path, 'rb') as lbpath:
lbpath.read(8)
buffer = lbpath.read()
labels = np.frombuffer(buffer, dtype=np.uint8)
with gzip.open(images_path, 'rb') as imgpath:
imgpath.read(16)
buffer = imgpath.read()
images = np.frombuffer(buffer,
dtype=np.uint8).reshape(
len(labels), 784)
return images, labels
will add it to the code notebook shortly
Should be fixed now in the Ch12 notebook!
In this file, the code loads the names as
However the linked .gz file has the names with a period, not a hyphen. It should be
https://www.reddit.com/r/learnpython/comments/6qc9t1/path_to_existing_file_in_root_folder_not_found_on/