ksharonin / kerchunkC

0 stars 0 forks source link

NetCDF: invalid attribute version: 3 + cascading: attribute name string is not null terminated #1

Closed ksharonin closed 9 months ago

ksharonin commented 9 months ago

2024-01-13T08:55:15.470Z 172.17.0.2:CRITICAL:H5DatasetDevice.cpp:146 Failed to create H5DatasetDevice for local//Area: invalid attribute version: 3 (/Area)

ksharonin commented 9 months ago

H5 Attribute message has dedicated byte for version: https://docs.hdfgroup.org/hdf5/v1_14/_f_m_t3.html#AttributeMessage:~:text=The%20Attribute%20Message-,Header%20Message%20Name%3A%20Attribute,-Header%20Message%20Type

NetCDF: Closet item to this is the file_format in NetCDF, but versioning numbers can't be found https://docs.unidata.ucar.edu/netcdf-c/current/group__datasets.html#:~:text=%E2%97%86-,nc_inq_format(),-int%20nc_inq_format

import netCDF4
nc_file = netCDF4.Dataset(URL, 'r')
format_version = nc_file.file_format
print(format_version)

output: NETCDF4
ksharonin commented 9 months ago

By manually overridding H5Coro.cpp with: if(version != 3) // WARNING: MODIFIED version != 1

Next encountered: attribute name string is not null terminated

attr_name:

2\252\000\001\322~\v\000\000\000\000\000\300m\001T\377\377\000\000\300\223\ra\377
2\252\000\001\322~\v\000\000\000\000\000\300m\001T\377\377\000\000\300\223\ra\377
2\252\000\001\322~\v\000\000\000\000\000\300m\001T\377\377\000\000\300\223\ra\377
\377\000\000d\255\246\257\252\252\000\000\322~\v\000\000\000\000\000Pn\001T\377\3
77\000\000\377\377\377\377\377\377\377\377\020Pa\245\377\377\000\000&\027\f", '\0
00' <repeats 13 times>, "\020\000\000\000\000\000\000\000\000_\230\342\252\252\00
0\000\f\a\000\000\000\000\000\001"

Notably, readAttributeMssg() is never called with h5, but only for netCDF sample...

ksharonin commented 9 months ago

Need to track down proper info from corresponding file locations: https://docs.unidata.ucar.edu/netcdf-c/current/file_format_specifications.html https://docs.unidata.ucar.edu/nug/current/file_structure_and_performance.html

How to even give an attribute to the reader? Does H5Coro enable this for the h5 level via an arg?

ksharonin commented 9 months ago

IMPORTANT:

"NetCDF-4 files are created with the HDF5 library, and are HDF5 files in every way, and can be read without the netCDF-4 interface. (Note that modifying these files with HDF5 will almost certainly make them unreadable to netCDF-4.)"

So in theory, the H5 interface should be suitable... Need to research how NetCDF varies with meta layers

Groups in a netCDF-4 file correspond with HDF5 groups (although the netCDF-4 tree is rooted not at the HDF5 root, but in group “_netCDF”).

Variables in netCDF correspond with identically named datasets in HDF5. Attributes similarly.

Since there is more metadata in a netCDF file than an HDF5 file, special datasets are used to hold netCDF metadata.

H5: This file format’s primary data models are groups and datasets. Groups are the overarching structure, and they can hold other groups or datasets. Datasets store raw data values of a specified data type and are usually stored within groups

NetCDF4: It stores data in a manner similar to HDF5, with groups serving as the overarching data structure. Within a group, there can be other groups or variables. Variables are akin to HDF5 datasets. Unlike HDF5 datasets, netCDF4 variables cannot be resized

By metadata inspection: nc: b'\x89HDF\r\n\x1a\n\x02\x08\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00' h5zip: b'\x89HDF\r\n\x1a\n\x00\x00\x00\x00\x00\x08\x08\x00\x04\x00\x10\x00'

NC is deemed an H5 file, so in theory the attr version should be in the same location

-- Current suspicion: some added netCDF attr/data is poisioning the offset value, since the attr name cascades with issues on null termination. Need to inspect the values retrived from h5zip --> find in the raw byte stream --> inspect

Compare with the retrieved from nc --> find in raw byte stream to est offset/jumps

-- The h5_zip file satisfied the peek condition, returning a different header version. See int H5FileBuffer::readObjHdr in H5Coro.cpp:1746

The NC file meanwhile has peek == 0

ksharonin commented 9 months ago

Looking into netcdf-c official implementation for expected offset reading and retriveal: https://github.com/Unidata/netcdf-c/blob/498930982d771aa2ae1600d2199ec824a68184fe/libsrc/v1hpg.c#L34

Note: attribute pulled:

$1 = "\000_FillValue\000\000\000\000\000@\225?\367\252\252\000\001\322~\v\000\000\000\000\000\300m\001D\377\377\000\000\300\223\255Q\377\377\000\000d-\005\352\252\252\000\000\322~\v\000\000\000\000\000Pn\001D\377\377\000\000\377\377\377\377\377\377\377\377\020\220\211P\377\377\000\000&\027\f", '\000' <repeats 13 times>, "\020\000\000\000\000\000\000\000@\225?\367\252\252\000\000\f\a\000\000\000\000\000\001"

Need to exactly pinpoint how this appears with netcdf-c, since it's pulling the name either it skips the first one and cascades incorrectly, or some other cascading issue

ksharonin commented 9 months ago

Tracking alternative implementations:

CONFIRMED: version 3 detected

Breakpoint 1: where = libhdf5.310.dylib`H5O__attr_decode + 168 at H5Oattr.c:147:15, address = 0x00000000001bb2d8
(lldb) r
Process 74213 launched: '/Users/katrinasharonin/Downloads/kerchunkC/code/c/run_hdf' (arm64)
libhdf5.310.dylib was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 74213 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001002d32d8 libhdf5.310.dylib`H5O__attr_decode(f=0x0000000100b3c550, open_oh=0x0000000000000000, mesg_flags=<unavailable>, ioflags=0x000000016fdfea9c, p_size=69, p="") at H5Oattr.c:147:15 [opt]
   144      if (H5_IS_BUFFER_OVERFLOW(p, 1, p_end))
   145          HGOTO_ERROR(H5E_OHDR, H5E_OVERFLOW, NULL, "ran off end of input buffer while decoding");
   146      attr->shared->version = *p++;
-> 147      if (attr->shared->version < H5O_ATTR_VERSION_1 || attr->shared->version > H5O_ATTR_VERSION_LATEST)
   148          HGOTO_ERROR(H5E_ATTR, H5E_CANTLOAD, NULL, "bad version number for attribute message");
   149 
   150      /* Get the flags byte if we have a later version of the attribute */
Target 0: (run_hdf) stopped.
(lldb) p attr->shared->version
(uint8_t) $0 = '\x03'

TODO: must confront lack of version 3 implementation

Image

Image

ksharonin commented 9 months ago

Implementation: aside from bumping bytes, need consideration for decoding e.g.: https://github.com/HDFGroup/hdf5/blob/b72cc4f7f4efead63c3a2582ce472ce8a5b5b0ae/src/H5Oattr.c#L176

As of now still outputting invalid name:

llValue, 0xb7eee

Note documentation observes two valid encoding:

0 ASCII character set encoding
1 UTF-8 character set encoding

Unknown version 1 expected encoding...

Another possibly issue contributes: version 3 does not pad name: The null-terminated attribute name. This field is not padded with additional bytes.

ksharonin commented 9 months ago

POTENTIAL SOL: bump pos +=8 using diagram. Of concern: lue is not a known attribute of power. This looks like a cut-out portion of FillValue. NOTE: found a fill value reading message conscious of version 3: https://github.com/ICESat2-SlideRule/sliderule/blob/60e7759d0da7ee43acf89b1d7fa816bc69aa34ab/packages/h5/H5Coro.cpp#L2380

  1. Since we cannot directly control attribute naming, this could be trying to randomly pull an attribute.... But it has to successfully align the version, so pure luck is out of window
  2. Improper encoding applied... However where does decoding apply? what is the default?

Hdf5 reflected encoding:

attr->shared->encoding = (H5T_cset_t)*p++; // reads value and casts int into defined enum
(lldb) p attr->shared->encoding
(H5T_cset_t) $0 = H5T_CSET_ASCII 
// (H5T_cset_t) attr->shared->encoding = H5T_CSET_ASCII
...

// called in H5O__attr_encode (This function encodes the native memory form of the attribute
    message in the "raw" disk form)
if (attr->shared->version >= H5O_ATTR_VERSION_3)
        *p++ = (uint8_t)attr->shared->encoding;

Uses enum encoding seen here: https://github.com/HDFGroup/hdf5/blob/b72cc4f7f4efead63c3a2582ce472ce8a5b5b0ae/src/H5Tpublic.h#L93C1-L111C14

Likely name parsing is still wrong. Need to print out entire surrounding string if possible

------------------
Test: Read Dataset
------------------
2024-01-17T09:14:9.086Z 172.17.0.2:DEBUG:LuaObject.cpp:304 Created object of type Asset/Asset
2024-01-17T09:14:9.086Z 172.17.0.2:DEBUG:LuaObject.cpp:304 Created object of type DeviceObject/DeviceObject
2024-01-17T09:14:9.087Z 172.17.0.2:CRITICAL:H5Coro.cpp:2879 received attr_name: lue
2024-01-17T09:14:9.104Z 172.17.0.2:DEBUG:H5Coro.cpp:3693 Read 1500 elements (6000 bytes) from local//Power
2024-01-17T09:14:9.104Z 172.17.0.2:DEBUG:LuaObject.cpp:304 Created object of type DeviceIO/DeviceReader

 vals recieved with type: nil
2024-01-17T09:14:9.104Z 172.17.0.2:DEBUG:DeviceReader.cpp:147 shutting down device and exiting reader
scripts/selftests/h5coro.lua:92: bad argument #2 to 'unpack' (string expected, got nil)
stack traceback:
        [C]: in function 'string.unpack'
        scripts/selftests/h5coro.lua:92: in main chunk
        [C]: in ?

All valid /Power attributes:

long_name
standard_name
valid_range
units
resolution
grid_mapping
cell_measures
cell_methods
ancillary_variables
ksharonin commented 9 months ago

Debugging continues:

- Analysis: successful reads: `LINK_INFO_MSG`, `LINK_MSG`
- Why even attempt to read? Some default pass... OG h5 zip file never calls, so some kind of signal is asking to call attr message
- Find where `msg_type` is set aka `heap_info->msg_type` where `heap_info` passed in from fractal

/ Build Heap Info Structure / heap_info_t heap_info = { .table_width = table_width, .curr_num_rows = curr_num_rows, .starting_blk_size = (int)starting_blk_size, .max_dblk_size = (int)max_dblk_size, .blk_offset_size = ((max_heap_size + 7) / 8), .dblk_checksum = ((flags & FRHP_CHECKSUM_DIRECT_BLOCKS) != 0), .msg_type = msg_type, .num_objects = (int)mg_objs, .cur_objects = 0 // updated as objects are read };

where it got called with `msg_type = ATTRIBUTE_MSG`

/ Follow Heap Address if Provided / if((int)heap_address != -1) { readFractalHeap(ATTRIBUTE_MSG, heap_address, hdr_flags, dlvl); }


- Relevant portion where `-1` is a cast of the undefined: https://docs.hdfgroup.org/hdf5/v1_14/_f_m_t3.html#UndefinedAddress:~:text=Term,Undefined%20Address
- https://docs.hdfgroup.org/hdf5/v1_14/_f_m_t3.html#FractalHeap:~:text=IV.A.2.c.%20The%20Link%20Info%20Message (follow fractal add.)
- Group: collection of links
- Link: group <-> object: https://davis.lbl.gov/Manuals/HDF5-1.8.7/UG/09_Groups.html
- Fractal address found means indicate it found attribute -> where would it find main dataset? 
- TODO: with hdf5 c program, examine the links

-------
- Version 2 object header as `peek == 0` is encountered
- Valid object header
- Note definition of attribute mssg as hit: `ATTRIBUTE_MSG           = 0xC,`
- Not sure why it jumps back to the top of the function once it does a `readMessage()` call 
- encountered `obj_hdr_flags`: `,`, `-`
- `FILE_STATS_BIT` ???
- `(gdb) p STORE_CHANGE_PHASE_BIT$8 = 16 '\020'`
- `Object CCSDS` ??? seems to be occuring under hdr flags
ksharonin commented 9 months ago

SOLVED ATTR NAME; see pushed hex txt files as reference

ksharonin commented 9 months ago

Subsequent bytes observed:

00 11 20 1f  00 04 00 00 00 00 00 20  |alue.. ........ |
000b3a10  00 17 08 00 17 7f 00 00  00 02 01 01 01 01 00 00 |................|
ksharonin commented 9 months ago

Today:

Printed out mssg values from hdf5 src, all as long long

​​Datatype Message:                                          216172782113784085 // VS 2332553642111 produced by datasmash VS. 7968 produced by databits 
Dataspace Message:                                         288230376151711747

Read trace from hdf5 src

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 4.1
  * frame #0: 0x000000010043100c libhdf5.310.dylib`H5T__imp_bit(n=4, perm=0x000000016fdfee7c, _a=0x000000016fdfee68, _b=0x000000016fdfee60, pad_mask="\xff\xff\xff\xff\U00000001", imp_bit=0x000000016fdfef0c) at H5Tinit_float.c:374:5 [opt]
    frame #1: 0x00000001004300a8 libhdf5.310.dylib`H5T__init_native_float_types at H5Tinit_float.c:469:5 [opt]
    frame #2: 0x00000001003a003c libhdf5.310.dylib`H5T_init at H5T.c:751:9 [opt]
    frame #3: 0x0000000100457590 libhdf5.310.dylib`H5VL_init_phase2 at H5VLint.c:198:13 [opt]
    frame #4: 0x0000000100119f00 libhdf5.310.dylib`H5_init_library at H5.c:268:17 [opt]
    frame #5: 0x000000010011b10c libhdf5.310.dylib`H5open at H5.c:1025:5 [opt]
    frame #6: 0x0000000100003b00 run_hdf`main at run_hdf5.c:18:34
    frame #7: 0x000000018ac64420 libdyld.dylib`start + 4

Reference structure for casting and reading, break in break set --file H5Tinit_float.c --line 530

typedef struct H5T_fpoint_det_t {
    unsigned      size;             /* Total byte size                  */
    unsigned      prec;             /* Meaningful bits                  */
    unsigned      offset;           /* Bit offset to meaningful bits    */
    int           perm[32];         /* For detection of byte order      */
    H5T_order_t   order;            /* byte order                       */
    unsigned      sign;             /* Location of sign bit             */
    unsigned      mpos, msize, imp; /* Information about mantissa       */
    H5T_norm_t    norm;             /* Information about mantissa       */
    unsigned      epos, esize;      /* Information about exponent       */
    unsigned long ebias;            /* Exponent bias for floating point */
    unsigned      comp_align;       /* Alignment for structure          */
} H5T_fpoint_det_t;
ksharonin commented 9 months ago

Traversing down parts:

ksharonin commented 9 months ago

Today:

---
hdf5 attempting to match same metadata

break set --file H5Oattr.c --line 240 ... p extent->version = 2 p extent->rank = 1 p extent->nelem = 1


---
hdf5 src info on dataspaces, possibly useful for debugging

struct H5S_extent_t { H5O_shared_t sh_loc; / Shared message info (must be first) /

H5S_class_t type;    /* Type of extent */
unsigned    version; /* Version of object header message to encode this object with */
hsize_t     nelem;   /* Number of elements in extent */

unsigned rank; /* Number of dimensions */
hsize_t *size; /* Current size of the dimensions */
hsize_t *max;  /* Maximum size of the dimensions */

};

ksharonin commented 9 months ago

Resolved, issue was unclear source

Matching data extracted, see break set --file H5Tinit_float.c --line 482 and observe det structure

Next: Find source of issue

ksharonin commented 9 months ago
(gdb) p info.data
$2 = (uint8_t *) 0xffff4c01b120 "\367\377\377\377"

...
// in lua

// cast to float form if core.INTEGER
e1 val: -nan
// core.DYNAMIC
e1 val: -9.0

xarray missing the attr

But hdf5 src reports Attribute Value: -9.000000 after cast via:

*((float *)
(lldb) p attribute_value
(void *) $0 = 0x000000016fdff150
(lldb) p *attribute_value
(lldb) 
ksharonin commented 9 months ago

CONFIRMED: Need to implement Version 2 B-Tree Node