Closed scottstanie closed 1 year ago
Are you able to provide the input files?
Sorry, didn't comment the sample code well. The first block is creating the dummy 3x3 input file:
data = np.random.rand(3, 3)
x = y = np.arange(3)
with h5py.File("test_rio.h5", "w") as hf:
hf['x'] = x
hf['y'] = y
hf['x'].make_scale()
hf['y'].make_scale()
hf['data'] = data
hf['data'].dims[0].attach_scale(hf['y'])
hf['data'].dims[1].attach_scale(hf['x'])
The first block is creating the dummy 3x3 input file
Thanks for pointing that out - I should have looked closer when reading the issue.
@scottstanie if you upgrade to GDAL 3.6.4 or 3.7.0 and rasterio 1.3.7 do you still see a segfault?
Confirmed with GDAL 3.6.3 and rasterio 1.3.7
(gdb) run issue2855.py
Starting program: /venv/bin/python issue2855.py
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f65b21ff640 (LWP 39)]
[New Thread 0x7f65b19fe640 (LWP 40)]
[New Thread 0x7f65af1fd640 (LWP 41)]
[New Thread 0x7f65aa9fc640 (LWP 42)]
[New Thread 0x7f65a81fb640 (LWP 43)]
[New Thread 0x7f65a79fa640 (LWP 44)]
[New Thread 0x7f65a31f9640 (LWP 45)]
[New Thread 0x7f6590fff640 (LWP 46)]
/app/rasterio/__init__.py:304: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
{'driver': 'HDF5Image', 'dtype': 'float64', 'nodata': None, 'width': 3, 'height': 3, 'count': 1, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0,
0.0, 1.0, 0.0), 'blockysize': 1, 'tiled': False}
Okay for single dataset
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007f659184353f in __Pyx_GetItemInt_Fast (o=0x7f658bd54240, i=0, wraparound=<optimized out>, boundscheck=0, is_list=0) at rasterio/_base.c:29957
29957 PyObject *r = PyList_GET_ITEM(o, n);
(gdb) bt
#0 0x00007f659184353f in __Pyx_GetItemInt_Fast (o=0x7f658bd54240, i=0, wraparound=<optimized out>,
boundscheck=0, is_list=0) at rasterio/_base.c:29957
#1 0x00007f659187193a in __pyx_pf_8rasterio_5_base_11DatasetBase_7profile___get__ (
__pyx_v_self=0x7f658bd2c240) at rasterio/_base.c:16267
#2 __pyx_pw_8rasterio_5_base_11DatasetBase_7profile_1__get__ (__pyx_v_self=0x7f658bd2c240)
at rasterio/_base.c:16108
#3 __pyx_getprop_8rasterio_5_base_11DatasetBase_profile (o=0x7f658bd2c240, x=<optimized out>)
at rasterio/_base.c:25544
#4 0x0000558954cd64c8 in _PyObject_GenericGetAttrWithDict ()
#5 0x0000558954cd4a1b in PyObject_GetAttr ()
#6 0x0000558954cc6485 in _PyEval_EvalFrameDefault ()
#7 0x0000558954cbd176 in ?? ()
#8 0x0000558954db2c56 in PyEval_EvalCode ()
#9 0x0000558954ddfb18 in ?? ()
#10 0x0000558954dd896b in ?? ()
#11 0x0000558954ddf865 in ?? ()
#12 0x0000558954dded48 in _PyRun_SimpleFileObject ()
#13 0x0000558954ddea43 in _PyRun_AnyFileObject ()
#14 0x0000558954dcfc3e in Py_RunMain ()
#15 0x0000558954da5bcd in Py_BytesMain ()
#16 0x00007f65b5b47d90 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x00007f65b5b47e40 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x0000558954da5ac5 in _start ()
If I turn Cython's bounds checking back on, I see no crash but an exception.
(gdb) run issue2855.py
Starting program: /venv/bin/python issue2855.py
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fe3a59ff640 (LWP 50)]
[New Thread 0x7fe3a31fe640 (LWP 51)]
[New Thread 0x7fe3a29fd640 (LWP 52)]
[New Thread 0x7fe39e1fc640 (LWP 53)]
[New Thread 0x7fe39b9fb640 (LWP 54)]
[New Thread 0x7fe3991fa640 (LWP 55)]
[New Thread 0x7fe3989f9640 (LWP 56)]
[New Thread 0x7fe3847ff640 (LWP 57)]
/app/rasterio/__init__.py:304: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
{'driver': 'HDF5Image', 'dtype': 'float64', 'nodata': None, 'width': 3, 'height': 3, 'count': 1, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0,
0.0, 1.0, 0.0), 'blockysize': 1, 'tiled': False}
Okay for single dataset
Traceback (most recent call last):
File "/app/issue2855.py", line 29, in <module>
print(src.profile)
File "rasterio/_base.pyx", line 1035, in rasterio._base.DatasetBase.profile.__get__
m.update(blockysize=self.block_shapes[0][0], tiled=False)
IndexError: list index out of range
[Thread 0x7fe3991fa640 (LWP 55) exited]
[Thread 0x7fe39b9fb640 (LWP 54) exited]
[Thread 0x7fe3a29fd640 (LWP 52) exited]
[Thread 0x7fe3989f9640 (LWP 56) exited]
[Thread 0x7fe39e1fc640 (LWP 53) exited]
[Thread 0x7fe3a31fe640 (LWP 51) exited]
[Thread 0x7fe3a59ff640 (LWP 50) exited]
[Thread 0x7fe3a92f91c0 (LWP 47) exited]
[Thread 0x7fe3847ff640 (LWP 57) exited]
[New process 47]
[Inferior 1 (process 47) exited with code 01]
In the 2 datasets case, src.block_shapes
is []
. GDAL is wrong at the top level, too. Size is 512, 512
isn't correct.
This usage is fine:
with rio.open("test_rio.h5") as src:
for sub in src.subdatasets:
with rio.open(sub) as dst:
print(dst.profile)
I'm not sure what src.profile
or src.read()
should even be in this case. Maybe it's better if rasterio.open()
doesn't open if given a multidataset HDF5 file and we provide another utility to get subdataset names?
@scottstanie if you upgrade to GDAL 3.6.4 or 3.7.0 and rasterio 1.3.7 do you still see a segfault?
Yep looks like same segfault
$ mamba create -n testrio rasterio=1.3.7
$ conda activate testrio
(testrio) $ rio --show-versions
rasterio info:
rasterio: 1.3.7
GDAL: 3.7.0
PROJ: 9.2.1
GEOS: 3.11.2
....
(testrio) $ rio info test_rio.h5
/Users/staniewi/miniconda3/envs/testrio/lib/python3.11/site-packages/rasterio/__init__.py:304: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
Segmentation fault: 11
And the wrong block shape is interesting, I had wondered in the past why it shows a default (512, 512)
block size even when none of the subdatasets are that size.
We'll have a fix in 1.3.8 @scottstanie.
Expected behavior and actual behavior.
Using
rasterio
to read from/inspect HDF5 files with more than one subdataset causes a segfault. This does not happen when there's only one dataset.Steps to reproduce the problem.
When I run this, I get
But it's readable by GDAL:
Environment Information
Installation Method
conda forge