maayane / catsHTM

A tool for fast accessing and cross-matching large astronomical catalogs
Apache License 2.0
28 stars 7 forks source link

SDSSDR10_htm_044300.hdf5 is broken for h5py #3

Open hombit opened 3 years ago

hombit commented 3 years ago

Hello and thank you for the project.

I'm trying to use catalogs and have found that SDSSDR10_htm_044300.hdf5 looks broken for h5py module:

import h5py

for dataset in h5py.File('SDSSDR10_htm_044300.hdf5'):
    print(dataset)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-ce06a7d9c9b1> in <module>
----> 1 for dataset in h5py.File('SDSSDR10_htm_044300.hdf5'):
      2     print(dataset)
      3

~/.local/lib/python3.7/site-packages/h5py/_hl/group.py in __iter__(self)
    431     def __iter__(self):
    432         """ Iterate over member names """
--> 433         for x in self.id.__iter__():
    434             yield self._d(x)
    435

h5py/h5g.pyx in h5py.h5g.GroupID.__iter__()

h5py/h5g.pyx in h5py.h5g.GroupID.__iter__()

h5py/h5g.pyx in h5py.h5g.GroupIter.__init__()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5g.pyx in h5py.h5g.GroupID.get_num_objs()

RuntimeError: Unable to get group info (bad symbol table node signature)

I have checked the md5 sum and it is matched.

hombit commented 3 years ago

h5stat utility also fails for this file:

Filename: SDSSDR10_htm_044300.hdf5
h5stat warning: Unable to traverse objects/links in file "SDSSDR10_htm_044300.hdf5"
maayane commented 3 years ago

Hi Konstantin, Can you send the catsHTM commands you tried to run on both the files you found have a problem? It will help us to help. Thanks

On Fri, 12 Feb 2021 at 09:29, Konstantin Malanchev notifications@github.com wrote:

h5stat utility also fails for this file:

Filename: SDSSDR10_htm_044300.hdf5 h5stat warning: Unable to traverse objects/links in file "SDSSDR10_htm_044300.hdf5"

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maayane/catsHTM/issues/3#issuecomment-778029333, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJFWTYU6CGIQ2XXJVHO4J3S6TKEDANCNFSM4XASWCRQ .

-- Dr. Maayane Soumagnac Postdoctoral researcher, Computational Research Division Lawrence Berkeley National Lab

hombit commented 3 years ago

Hello Maayane,

Thank you for your answer!

I cannot remember the exact catsHTM.cone_search arguments I used, but I've gone through the trace and found that the problem occurs when h5py tries to read this file. I don't think that there is something wrong with h5py itself, because hdf5 utilities like h5stat or h5dump cannot open this file too.

I have problems with some other files too, I'll prepare a full list and write it here.

I use catsHTM 0.1.32, h5py 3.1.0, h5stat 1.8.12

maayane commented 3 years ago

Ok, good, Please go ahead and send me a list of files and I will go through them. And if you do have cone search examples it will help too. From my experience.

Maayane

On Fri, 12 Feb 2021 at 11:12, Konstantin Malanchev notifications@github.com wrote:

Hello Maayane,

Thank you for your answer!

I cannot remember the exact catsHTM.cone_search arguments I used, but I've gone through the trace and found that the problem occurs when h5py tries to read this file. I don't think that there is something wrong with h5py itself, because hdf5 utilities like h5stat or h5dump cannot open this file too.

I have problems with some other files too, I'll prepare a full list and write it here.

I use catsHTM 0.1.32, h5py 3.1.0, h5stat 1.8.12

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maayane/catsHTM/issues/3#issuecomment-778074026, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJFWT7RXBL4BFO27YPVJ6LS6TWIJANCNFSM4XASWCRQ .

-- Dr. Maayane Soumagnac Postdoctoral researcher, Computational Research Division Lawrence Berkeley National Lab

hombit commented 3 years ago

I've run the following script on my catsHTM directory:

for FILE in $(find . -name '*.hdf5'); do
    h5stat "$FILE" > /dev/null
done

It gave me this output:

h5stat warning: Unable to traverse objects/links in file "SDSS/DR10/SDSSDR10_htm_044300.hdf5"
h5stat warning: Unable to traverse objects/links in file "NED/20180502/NEDz_htm_041800.hdf5"
h5stat error: unable to open file "NED/20180502/NEDz_htm_042500.hdf5"
h5stat error: unable to open file "NED/20180502/NEDz_htm_018800.hdf5"
h5stat warning: Unable to traverse objects/links in file "NED/20180502/NEDz_htm_018900.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/KiDS/DR3/VSTkids_htm_310000.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/KiDS/DR3/VSTkids_htm_258700.hdf5"
h5stat error: unable to open file "VST/ATLAS/DR3/VSTatlas_htm_438200.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/ATLAS/DR3/VSTatlas_htm_452200.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/ATLAS/DR3/VSTatlas_htm_449300.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/ATLAS/DR3/VSTatlas_htm_452000.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/ATLAS/DR3/VSTatlas_htm_438300.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/ATLAS/DR3/VSTatlas_htm_444400.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/ATLAS/DR3/VSTatlas_htm_442300.hdf5"
h5stat warning: Unable to traverse objects/links in file "UKIDSS/DR10/UKIDSS_htm_043900.hdf5"
h5stat warning: Unable to traverse objects/links in file "Spitzer/SAGE/SAGE_htm_533100.hdf5"
h5stat warning: Unable to traverse objects/links in file "VISTA/Viking/DR2/VISTAviking_htm_244600.hdf5"
h5stat warning: Unable to traverse objects/links in file "VISTA/Viking/DR2/VISTAviking_htm_265700.hdf5"

These files look broken for hdf5 utilities

hombit commented 3 years ago

I've found a cone search example.

Output of my script, catalog is NEDz, ra=228.80143 deg, dec=0.48434 deg, radius=60 arcsec:

...
    data, names, units = cone_search(cat, ra_rad, dec_rad, radius_arcsec, catalogs_dir=path)
  File "script.py", line 134, in cone_search
    cat = class_HDF5.HDF5(root_to_data + CatDir + '/' + FileName_0).load(DataName_0, numpy_array=True).T
  File "/home/kostya/.local/lib/python3.7/site-packages/catsHTM/class_HDF5.py", line 59, in load
    f = h5py.File(filename, 'r')
  File "/home/kostya/.local/lib/python3.7/site-packages/h5py/_hl/files.py", line 427, in __init__
    swmr=swmr)
  File "/home/kostya/.local/lib/python3.7/site-packages/h5py/_hl/files.py", line 190, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 96, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

I've run it with strace, this part of its output shows that the problem is with NEDz_htm_018800.hdf5 file which I've found in the previous message:

$ strace -f -t -e trace=file ./my_script.py
...
[pid 19970] 00:34:47 stat(".../NED/20180502/NEDz_htmColCell.mat", {st_mode=S_IFREG|0640, st_size=1164, ...}) = 0
[pid 19970] 00:34:47 open(".../NED/20180502/NEDz_htmColCell.mat", O_RDONLY|O_CLOEXEC) = 4
[pid 19970] 00:34:47 stat(".../NED/20180502/NEDz_htm.hdf5", {st_mode=S_IFREG|0640, st_size=2274192, ...}) = 0
[pid 19970] 00:34:47 open(".../NED/20180502/NEDz_htm.hdf5", O_RDONLY) = 4
[pid 19970] 00:34:47 lstat(".../NED/20180502/NEDz_htm.hdf5", {st_mode=S_IFREG|0640, st_size=2274192, ...}) = 0
[pid 19970] 00:34:47 stat(".../NED/20180502/NEDz_htm_018800.hdf5", {st_mode=S_IFREG|0640, st_size=5902144, ...}) = 0
[pid 19970] 00:34:47 open(".../NED/20180502/NEDz_htm_018800.hdf5", O_RDONLY) = 4
...
gnarayan commented 2 years ago

Hi @maayane - is there any update on this issue, and adding object IDs to the catsHTM files. I'm trying to get some reliable multi-catalog crossmatch service setup for LSST DESC, and our activities in the Time Domain and Photo-Z Working Group in particular and would like to use catsHTM, but without object IDs and these broken files for different surveys, we're a bit stuck.