tomgoldstein / loss-landscape

Code for visualizing the loss landscape of neural nets
MIT License
2.72k stars 388 forks source link

h5py version #12

Open wenwei202 opened 5 years ago

wenwei202 commented 5 years ago

As discussed in #4 , h5py 2.7.0 is required. After downgrading pip install h5py=2.7.0, I still have

$ mpirun -n 4 python test_h5py.py
hdf5_version=1.10.1
hdf5_version=1.10.1
rank 0 read and write
Traceback (most recent call last):
  File "test_h5py.py", line 10, in <module>
    f  = h5py.File('surf_file.h5', 'r+')
  File "/home/weiw/anaconda2/lib/python2.7/site-packages/h5py/_hl/files.py", line 271, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/home/weiw/anaconda2/lib/python2.7/site-packages/h5py/_hl/files.py", line 103, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
IOError: Unable to open file (Unable to lock file, errno = 11, error message = 'resource temporarily unavailable')
hdf5_version=1.10.1
rank 0 read and write

Also seems the version plots are different here:

$ python
>>> import h5py
>>> print (h5py.version.hdf5_version)
1.10.1
>>> print h5py.__version__
2.7.0
>>>

OS: "Ubuntu 16.04.5 LTS" Python:

Python 2.7.15 |Anaconda, Inc.| (default, Oct 23 2018, 18:31:10)
[GCC 7.3.0] on linux2
ljk628 commented 5 years ago

Hi Wei,

Thanks for the report! As discussed in https://github.com/h5py/h5py/issues/712 and https://github.com/tomgoldstein/loss-landscape/issues/4, the code works well with hdf5 v 1.8.16 but may fail with hdf5 v 1.10. Could you also check dpkg -s libhdf5-dev? If it still shows v 1.10, then it does not work.

The default hdf5 (libhdf5-dev) comes with ubuntu 16.04 (code name 'xenial') is 1.8.16, it is upgraded to 1.10.0 in Ubuntu 18.04 (code name 'bionic') https://packages.ubuntu.com/bionic/libhdf5-dev Could you check your OS code name with lsb_release -a? The lib may be upgraded to higher version.

If downgrading to 1.8.16 is not easy, a temporary (ugly) solution is:

It is not elegant, we will try to work out a better solution.

wenwei202 commented 5 years ago

Thanks! I have installed the specific hdf5, how to build h5py from the specific hdf5 then? The tutorial is just not good. I cannot find setup.py anywhere.

ljk628 commented 5 years ago

Hi Wei,

The source code for h5py can be found here: https://github.com/h5py/h5py , which is the place you can find setup.py. You may need to build from the source. Please let us know if there is any problems once these setting is up. Sorry for the painful process of downgrading.

wenwei202 commented 5 years ago

Trying to get there, but python setup.py install get /home/weiw/github/h5py/h5py/defs.c:609:17: fatal error: mpi.h: No such file or directory. Seems mpi is installed. Let me investigate.

wenwei202 commented 5 years ago

It took me two days to get it work. My recalls:

Note: Anaconda may not work (conda installed hdf5 may conflict with your customized installation). Anaconda also has mpi binaries installed and may suffer some conflicts issues. (My initial trials were in Anaconda unfortunately :( )

  1. install openmpi and mpi4py==2.0.0
    
    # download openmpi-3.1.2
    cd openmpi-3.1.2
    ./configure --prefix=/usr/local
    make all
    sudo make install

sudo pip install mpi4py==2.0.0

2. install hdf5 with `--enable-parallel`

download and unzip hdf5_1.8.16+docs.orig.tar.gz

cd hdf5-1.8.16+docs/ ./configure --prefix=/usr/local/ --enable-parallel make sudo make install

3. install h5py with your hdf5

git clone https://github.com/h5py/h5py.git cd h5py git checkout 2.7.0 python setup.py configure --hdf5=/usr/local/ python setup.py configure --mpi sudo python setup.py install

ljk628 commented 5 years ago

Thanks for the clear instructions, Wei! We will updated README with a pointer to your scripts.

deepmo24 commented 5 years ago

I have the same problem as you. I'd like to provide my solution here.

Since the error is caused by the mode conflicts(r+ and r), we can solve this by changing the code, not depending on the version of h5py or hdf5. My idea is creating another h5 file( named surf_file_new) to store the losses and accuracies, so we only need to open the surf_file with 'r' mode and surf_file_new with 'w' mode.

For less code changing, I copy the content of surf_file to surf_file_new and after function cruch(), I let surf_file = surf_file_new, so that we don't need to change the subsequent code.

Jamesswiz commented 5 years ago

Downgrading h5py worked for me

pip3 install --user 'h5py==2.7.0' --force-reinstall --no-cache-dir

jwhitlow45 commented 2 years ago

Downgrading h5py worked for me

pip3 install --user 'h5py==2.7.0' --force-reinstall --no-cache-dir

Fixed issue with 1.10.6 conflicting with model which was built with 1.10.4

CHENBIN99 commented 1 year ago

Is it possible to support the latest h5py and hdfs by modifying the code?