shahsohil / DCC

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper
MIT License
208 stars 53 forks source link

Problem to open MNIST data #2

Closed ttgump closed 6 years ago

ttgump commented 6 years ago

Hi, I am trying to repeat the result. When I am trying to open the data of MNIST provided, the error raised:

>>> data = sio.loadmat('testdata.mat', mat_dtype=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xsede/users/xs-ttgump/.local/lib/python3.6/site-packages/scipy/io/matlab/mio.py", line 141, in loadmat
    MR, file_opened = mat_reader_factory(file_name, appendmat, **kwargs)
  File "/home/xsede/users/xs-ttgump/.local/lib/python3.6/site-packages/scipy/io/matlab/mio.py", line 65, in mat_reader_factory
    mjv, mnv = get_matfile_version(byte_stream)
  File "/home/xsede/users/xs-ttgump/.local/lib/python3.6/site-packages/scipy/io/matlab/miobase.py", line 241, in get_matfile_version
    raise ValueError('Unknown mat file type, version %s, %s' % ret)
ValueError: Unknown mat file type, version 54, 50

It seems like the mat format provided by authors are not correct. Thanks.

shahsohil commented 6 years ago

Hi,

Did you try with python 2.7 ?

ttgump commented 6 years ago

Yes. This is the output on python 2.7:

Python 2.7.9 (default, Apr 11 2016, 16:35:11)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy.io as sio
>>> data = sio.loadmat('testdata.mat', mat_dtype=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xsede/users/xs-ttgump/.local/lib/python2.7/site-packages/scipy/io/matlab/mio.py", line 141, in loadmat
    MR, file_opened = mat_reader_factory(file_name, appendmat, **kwargs)
  File "/home/xsede/users/xs-ttgump/.local/lib/python2.7/site-packages/scipy/io/matlab/mio.py", line 65, in mat_reader_factory
    mjv, mnv = get_matfile_version(byte_stream)
  File "/home/xsede/users/xs-ttgump/.local/lib/python2.7/site-packages/scipy/io/matlab/miobase.py", line 241, in get_matfile_version
    raise ValueError('Unknown mat file type, version %s, %s' % ret)
ValueError: Unknown mat file type, version 54, 50
ttgump commented 6 years ago

I also tried to open the data file in R by using R.matlab package, but got the same error:

> library(R.matlab)
R.matlab v3.6.1 (2016-10-19) successfully loaded. See ?R.matlab for help.

Attaching package: ‘R.matlab’

The following objects are masked from ‘package:base’:

    getOption, isOpen

> traindata = readMat("traindata.mat")
Error in mat5ReadTag(this) : 
  Unknown data type. Not in range [1,19]: 12852
In addition: Warning messages:
1: In readMat5or73Header(this, firstFourBytes = firstFourBytes) :
  Unknown endian: 63. Will assume Bigendian.
2: In readMat5or73Header(this, firstFourBytes = firstFourBytes) :
  Unknown MAT version tag: 13111. Will assume version 5.
shahsohil commented 6 years ago

I am not able to recreate the error. I am using scipy 1.0.0 Also please verify if you are able to load the file in MATLAB. It works fine for me. I get, ``

import scipy.io as sio data = sio.loadmat('testdata.mat',mat_dtype=True) data {'Y': array([[7, 2, 1, ..., 4, 5, 6]], dtype=uint8), 'X': array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]), 'version': '1.0', 'header': 'MATLAB 5.0 MAT-file Platform: posix, Created on: Wed Aug 23 13:37:24 2017', 'globals': []}

ttgump commented 6 years ago

By using python 2.7 I solved this problem. Thanks! After training DCC, I got the tensorboard log file, but it is like some of the Mojibake:

total_loss��=M��!*       ���� �|Ӣ��A�*

reconstruction_loss�O;���)       QKD   �|Ӣ��A�*

dcc_lossw6�=���^       `/�#  ˅�֢��A�*

sigma1?�'B��       `/�# ���֢��A�*

sigma2�}
?(��       `/�#   ڐ�֢��A�*

lambda��|B�3!       {�� �6  ۢ��A�*


total_loss|�=���H*       ����  �H  ۢ��A�*

reconstruction_loss)�N;���       QKD    OZ  ۢ��A�*

dcc_lossst�=B�}�       `/�#  �y�ߢ��A�*

sigma1?�'B��p�       `/�#    ��ߢ��A�*

sigma2�}
?h],y       `/�#   ��ߢ��A�*

lambda��|B85h�!       {��    +L䢯�A�*

shahsohil commented 6 years ago

Great. You need to run tensorboard using

tensorboard --logdir /path/to/log/directory

Please find the instruction in the tensorboard website.

LemonPi commented 5 years ago

I'm also getting this issue with python 2.7, scipy 1.2.1. Loading it in MATLAB 2018a also gives an error of Not a binary MAT-file. Try load-ASCII to read as text. Probably only compatible with older MATLAB/scipy versions? Maybe should be updated...

LemonPi commented 5 years ago

Ah, it seems like when I cloned the repo, the datasets were not cloned properly (due to size restrictions?), so I ended up with .mat files that are ~100 bytes large! It might be better to remove the actual data from the repo and explicitly tell people to get the datasets from your drive: https://drive.google.com/drive/folders/13D-k2p8sov4BFpFDSFu-pT0yHDPbJVFk