hanbt / learn_dl

Deep learning algorithms source code for beginners
Apache License 2.0
1.2k stars 988 forks source link

mnist.py Loader类加载错误 #52

Open pengfei123xiao opened 5 years ago

pengfei123xiao commented 5 years ago

您好,我在运行 mnist.py里的transpose(get_training_data_set())方法时,Loader类提示了错误。

     24         将unsigned byte字符转换为整数
     25         '''
---> 26         return struct.unpack('B', byte)[0]
     27 
     28 

TypeError: a bytes-like object is required, not 'int'

我的数据是从tensorflow内下载下来的。

from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets('', one_hot=True)

求指教,谢谢。

Jiahui-Wu commented 5 years ago

去掉文件后面的.gz(http://yann.lecun.com/exdb/mnist/ 数据集介绍)

我把函数改写了: **def to_int(self, byte): ''' 将unsigned byte字符转换为整数 '''

return struct.unpack('B', byte)[0]

    #print(type(byte))
    return byte**

但是程序运行其它地方报错(自问自答,原来文件直接解压,修改好名字:其中一个 - 变成 .): *self.to_int(content[start + i 28 + j])) IndexError: index out of range**

我调试了一下,是f.read()时的content没有读完全好像 def get_file_content(self): ''' 读取文件内容 ''' f = open(self.path, 'rb') content = f.read() print(len(content)) -----> 9912422,应该是60000X(28X28+16) f.close() return content

另外,start = index 28 28 + 16 为什么要加16?是不是和offset有关。。 THE IDX FILE FORMAT the IDX file format is a simple format for vectors and multidimensional matrices of various numerical types. The basic format is

magic number size in dimension 0 size in dimension 1 size in dimension 2 ..... size in dimension N data

The magic number is an integer (MSB first). The first 2 bytes are always 0.

The third byte codes the type of the data: 0x08: unsigned byte 0x09: signed byte 0x0B: short (2 bytes) 0x0C: int (4 bytes) 0x0D: float (4 bytes) 0x0E: double (8 bytes)

The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices....

The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).

The data is stored like in a C array, i.e. the index in the last dimension changes the fastest.