shahsohil / DCC

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper
MIT License
208 stars 53 forks source link

Read/write issues for h5 files #9

Closed scottfleming closed 5 years ago

scottfleming commented 5 years ago

Lines 32-34 of copyGraph.py are such that, if you use the command given in the README, your featurefile and outputfile will have the same name. But then you're reading frmo the same file with data0 that you're writing to with data2 so h5py throws an error: "OSError: Unable to create file (unable to truncate a file which is already open)"

On a related note, was there any particularly compelling reason for storing the dataset from reuters as an h5 in make_data.py vs. just using scipy.io.savemat like you did with the other datasets?

Put another way, why not just put everything in a dictionary (e.g. data['X'] is a numpy array with the data features and data['Y'] is a numpy array with the labels) and then pickle it?

shahsohil commented 5 years ago

Yes, you are right. This creates problem for .h5 format. For .pkl the command will work as given. Thanks for pointing it out.

I suggest that you use different file. However, make sure that the input to DCC is still given as pretrained.h5 or .mat