Open heroxbd opened 7 years ago
I have the same problem, I have the real name of the file BCF_training_1/BCF_training_516_gcr.hdf
but it is symlink, and while trying to open it I have
IOError: Unable to open file (Unable to open file: name = 'bcf_training_1/bcf_training_516_gcr.hdf', errno = 2, error message = 'no such file or directory', flags = 0, o_flags = 0)
I think the issue here is slightly mal-formed hdf5 files and the symlink issue is a red-herring, the actual issue is the relative paths and the dependence on the current working directory.
https://support.hdfgroup.org/HDF5/doc/RM/RM_H5L.html#Link-CreateExternal gives the lookup rules for external links which are
If target_file_name is a relative pathname, the following steps are performed:
- The library will get the prefix(es) set in the environment variable HDF5_EXT_PREFIX and will try to prepend each prefix to target_file_name to form a new target_file_name.
- If the new target_file_name does not exist or if HDF5_EXT_PREFIX is not set, the library will get the prefix set via H5Pset_elink_prefix and prepend it to target_file_name to form a new target_file_name.
- If the new target_file_name does not exist or no prefix is being set by H5Pset_elink_prefix, then the path of the file associated with link_loc_id is obtained. This path can be the absolute path or the current working directory plus the relative path of that file when it is created/opened. The library will prepend this path to target_file_name to form a new target_file_name.
- If the new target_file_name does not exist, then the library will look for target_file_name and will return failure/success accordingly.
If the working directory is 'right' it will work (and I am not sure why the OP example does not work, given that the cwd should have been set right), but I would not depend on that. Storing relative paths with '..' seems to work more reliably.
from pathlib import Path
import h5py
import numpy as np
def compute_relative_with_dots(a, b):
try:
return a.relative_to(b)
except ValueError:
pass
dot_path = Path('..')
for p in b.parents:
try:
rp = a.relative_to(p)
return dot_path / rp
except ValueError:
dot_path /= Path('..')
test_path = Path('/tmp/symlink')
data_path = test_path / 'data' / 'very' / 'deep'
annex_path = test_path / 'annex' / 'very' / 'deep'
access_path = test_path / 'access'
link_file = data_path / 'sym_base.h5'
target_file = annex_path / 'base.h5'
test_file = access_path / 'has_external_link.h5'
test_path.mkdir(exist_ok=True, parents=True)
data_path.mkdir(exist_ok=True, parents=True)
annex_path.mkdir(exist_ok=True, parents=True)
access_path.mkdir(exist_ok=True, parents=True)
with h5py.File(target_file, 'w') as f:
f['target'] = np.ones(5)
try:
link_file.unlink()
except FileNotFoundError:
pass
link_file.symlink_to(compute_relative_with_dots(target_file, data_path))
with h5py.File(test_file, 'w') as f:
f['in_file'] = np.ones(5) * 3
f['ext_link'] = h5py.ExternalLink(target_file.relative_to(test_path), 'target')
f['sym_ext_link'] = h5py.ExternalLink(link_file.relative_to(test_path), 'target')
f['ext_link_works'] = h5py.ExternalLink(compute_relative_with_dots(target_file, access_path), 'target')
f['sym_link_works'] = h5py.ExternalLink(compute_relative_with_dots(link_file, access_path), 'target')
def test_with_cwd(cwd_path):
print('-'*25)
print('with {} as cwd'.format(cwd_path))
os.chdir(cwd_path)
with h5py.File(test_file, 'r') as f:
for k in ['in_file', 'ext_link', 'sym_ext_link', 'ext_link_works', 'sym_link_works']:
try:
print(f[k][:])
except Exception as e:
print(k)
print(e)
print('-'*25)
print()
test_with_cwd(Path('~').expanduser())
test_with_cwd(test_path)
which gives
-------------------------
with /home/tcaswell as cwd
[ 3. 3. 3. 3. 3.]
ext_link
"Unable to open object (Unable to open file: name = '/tmp/symlink/access/annex/very/deep/base.h5', errno = 2, error message = 'no such file or directory', flags = 0, o_flags = 0)"
sym_ext_link
"Unable to open object (Unable to open file: name = '/tmp/symlink/access/data/very/deep/sym_base.h5', errno = 2, error message = 'no such file or directory', flags = 0, o_flags = 0)"
[ 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1.]
-------------------------
-------------------------
with /tmp/symlink as cwd
[ 3. 3. 3. 3. 3.]
[ 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1.]
-------------------------
the h5 file is:
11:32 $ h5ls symlink/access/has_external_link.h5
ext_link External Link {annex/very/deep/base.h5//target}
ext_link_works External Link {../annex/very/deep/base.h5//target}
in_file Dataset {5}
sym_ext_link External Link {data/very/deep/sym_base.h5//target}
sym_link_works External Link {../data/very/deep/sym_base.h5//target}
(dd36) ✔ /tmp
and the file structure is:
11:32 $ ls -lR symlink/
symlink/:
total 0
drwxr-xr-x 2 tcaswell tcaswell 60 Jun 14 11:32 access
drwxr-xr-x 3 tcaswell tcaswell 60 Jun 14 11:32 annex
drwxr-xr-x 3 tcaswell tcaswell 60 Jun 14 11:32 data
symlink/access:
total 4
-rw-r--r-- 1 tcaswell tcaswell 2184 Jun 14 11:32 has_external_link.h5
symlink/annex:
total 0
drwxr-xr-x 3 tcaswell tcaswell 60 Jun 14 11:32 very
symlink/annex/very:
total 0
drwxr-xr-x 2 tcaswell tcaswell 60 Jun 14 11:32 deep
symlink/annex/very/deep:
total 4
-rw-r--r-- 1 tcaswell tcaswell 2184 Jun 14 11:32 base.h5
symlink/data:
total 0
drwxr-xr-x 3 tcaswell tcaswell 60 Jun 14 11:32 very
symlink/data/very:
total 0
drwxr-xr-x 2 tcaswell tcaswell 60 Jun 14 11:32 deep
symlink/data/very/deep:
total 0
lrwxrwxrwx 1 tcaswell tcaswell 32 Jun 14 11:32 sym_base.h5 -> ../../../annex/very/deep/base.h5
(dd36) ✔ /tmp
This is with master(ish) h5py, hdf5 1.10, and python 3.6.
In my set up,
shar/sel0/mpro1/Plate_Pb210/011835.h5
has a "tt" field that is an ExternalLink to the "tt" field oftt/sel0/mpro1/Plate_Pb210/011835.h5
. If the target is a symlink, h5py cannot resolve ExternalLink correctly.But rhdf5 package of R, which links to hdf5 directly, can,
It works also if the target is a normal file.
Versions: h5py-2.6.0 and 2.7.0 both tested hdf5-1.8.18 rhdf5-2.14.0 linked to system hdf5-1.8.18 python-2.7.12 R-3.3.2