mila-iqia / fuel

A data pipeline framework for machine learning
MIT License
867 stars 268 forks source link

Unicode error/crash #390

Open mkurtys opened 7 years ago

mkurtys commented 7 years ago

Python 3.6.0 :: Anaconda 4.3.1 (32-bit) Fuel 0.2.0

fuel_test.py is simple script doing basic operations on MNIST dataset. Running code ends with error, sometimes with error and crash. Dataset was downloaded and converted using fuel 0.2.0

C:\Users\mic\Documents\neural\nn_assignments>d:\anaconda3\python.exe fuel_test.py

Traceback (most recent call last): File "fuel_test.py", line 13, in mnist_train = MNIST(("train",), subset=slice(None,50000)) File "d:\anaconda3\lib\site-packages\fuel\datasets\mnist.py", line 33, in init which_sets=which_sets, **kwargs) File "d:\anaconda3\lib\site-packages\fuel\datasets\hdf5.py", line 188, in init self._parse_dataset_info() File "d:\anaconda3\lib\site-packages\fuel\datasets\hdf5.py", line 212, in _parse_dataset_info available_splits = self.get_all_splits(handle) File "d:\anaconda3\lib\site-packages\fuel\datasets\hdf5.py", line 316, in get_all_splits set(row['split'].decode('utf8') for row in h5file.attrs['split'])) File "d:\anaconda3\lib\site-packages\fuel\datasets\hdf5.py", line 316, in set(row['split'].decode('utf8') for row in h5file.attrs['split'])) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 2: unexpected end of data

dmitriy-serdyuk commented 7 years ago

Looks like some windows problem with unicode. Have you tried to debug it and take a look what is in row['split']?

mkurtys commented 7 years ago

(Pdb) row['split'] b'\x00`\xea'

(Pdb) row (b'\x00`\xea', b'\x00\x00\x00\x00p\x11\x01', 214581513910484992, 11777, None, False, b'')

dmitriy-serdyuk commented 7 years ago

Looks like your data is corrupted. Try to download and convert MNIST again.