Open pengfei123xiao opened 5 years ago
去掉文件后面的.gz(http://yann.lecun.com/exdb/mnist/ 数据集介绍)
我把函数改写了: **def to_int(self, byte): ''' 将unsigned byte字符转换为整数 '''
#print(type(byte))
return byte**
但是程序运行其它地方报错(自问自答,原来文件直接解压,修改好名字:其中一个 - 变成 .): *self.to_int(content[start + i 28 + j])) IndexError: index out of range**
我调试了一下,是f.read()时的content没有读完全好像 def get_file_content(self): ''' 读取文件内容 ''' f = open(self.path, 'rb') content = f.read() print(len(content)) -----> 9912422,应该是60000X(28X28+16) f.close() return content
另外,start = index 28 28 + 16 为什么要加16?是不是和offset有关。。 THE IDX FILE FORMAT the IDX file format is a simple format for vectors and multidimensional matrices of various numerical types. The basic format is
magic number size in dimension 0 size in dimension 1 size in dimension 2 ..... size in dimension N data
The magic number is an integer (MSB first). The first 2 bytes are always 0.
The third byte codes the type of the data: 0x08: unsigned byte 0x09: signed byte 0x0B: short (2 bytes) 0x0C: int (4 bytes) 0x0D: float (4 bytes) 0x0E: double (8 bytes)
The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices....
The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).
The data is stored like in a C array, i.e. the index in the last dimension changes the fastest.
您好,我在运行
mnist.py
里的transpose(get_training_data_set())
方法时,Loader
类提示了错误。我的数据是从tensorflow内下载下来的。
求指教,谢谢。