silx-kit / vscode-h5web

VSCode extension to explore and visualize HDF5 files
https://marketplace.visualstudio.com/items?itemName=h5web.vscode-h5web
MIT License
33 stars 5 forks source link

Support viewing JLD2 files #46

Closed hz-xiaxz closed 2 months ago

hz-xiaxz commented 3 months ago

Is your feature request related to a problem?

JLD2 is a file type natively created by Julia programming language, and it is heavily used in high performance scientific programs, JLD2

JLD2 is designed as comprising a subset of HDF5, though I'm non-expert in this repository, I think it might be reachable to support JLD2 with likewise interface.

Alternatives you've considered

The authors of JLD2.jl are eagering for a vscode-extension to visualize JLD2 files, see discussions below https://discourse.julialang.org/t/jld2-preview-in-vscode/80050/6 https://github.com/julia-vscode/julia-vscode/issues/2863

Additional context

Currently opening a JLD2 using H5web gives error message like below, any hint to solve it?

HDF5-DIAG: Error detected in HDF5 (1.14.2) thread 0: #000: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5D.c line 403 in H5Dopen2(): unable to synchronously open dataset major: Dataset minor: Can't open object #001: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5D.c line 364 in H5D__open_api_common(): unable to open dataset major: Dataset minor: Can't open object #002: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLcallback.c line 1980 in H5VL_dataset_open(): dataset open failed major: Virtual Object Layer minor: Can't open object #003: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLcallback.c line 1947 in H5VL__dataset_open(): dataset open failed major: Virtual Object Layer minor: Can't open object #004: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLnative_dataset.c line 321 in H5VL__native_dataset_open(): unable to open dataset major: Dataset minor: Can't open object #005: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dint.c line 1429 in H5D__open_name(): can't open dataset major: Dataset minor: Unable to initialize object #006: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dint.c line 1494 in H5D_open(): not found major: Dataset minor: Object not found #007: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dint.c line 1689 in H5D__open_oid(): unable to load type info from dataset header major: Dataset minor: Unable to initialize object #008: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Omessage.c line 432 in H5O_msg_read(): unable to read object header message major: Object header minor: Read failed #009: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Omessage.c line 487 in H5O_msg_read_oh(): unable to decode message major: Object header minor: Unable to decode value #010: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Oshared.h line 61 in H5O__dtype_shared_decode(): unable to decode shared message major: Object header minor: Unable to decode value #011: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Oshared.c line 358 in H5O__shared_decode(): unable to retrieve native message major: Object header minor: Read failed #012: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Oshared.c line 172 in H5O__shared_read(): unable to read message major: Object header minor: Read failed #013: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Omessage.c line 432 in H5O_msg_read(): unable to read object header message major: Object header minor: Read failed #014: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Omessage.c line 487 in H5O_msg_read_oh(): unable to decode message major: Object header minor: Unable to decode value #015: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Oshared.h line 74 in H5O__dtype_shared_decode(): unable to decode native message major: Object header minor: Unable to decode value #016: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Odtype.c line 1338 in H5O__dtype_decode(): can't decode type major: Datatype minor: Unable to decode value #017: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Odtype.c line 154 in H5O__dtype_decode_helper(): bad version number for datatype message major: Datatype minor: Unable to load metadata into cache
mkitti commented 3 months ago

Do you have a sample file available for testing?

JonasIsensee commented 3 months ago

EDIT: This is actually not really a problem with JLD2. The h5web viewer cannot cope with committed datatypes. (that are linked from groups) Demo:

``` julia> f = h5open("test.h5", "w") πŸ—‚οΈ HDF5.File: (read-write) test.h5 julia> g = create_group(f, "types") πŸ“‚ HDF5.Group: /types (file: test.h5) julia> t = commit_datatype(f, "types/AStruct", datatype(AStruct)) HDF5.Datatype: /types/AStruct H5T_COMPOUND { H5T_STD_I64LE "x" : 0; H5T_IEEE_F64LE "y" : 8; } julia> d = create_dataset(f, "data", t, (1,1)) πŸ”’ HDF5.Dataset: /data (file: test.h5 xfer_mode: 0) julia> d[1,1] = AStruct(1,2) AStruct(1, 2.0) julia> close(f) shell> h5dump test.h5 HDF5 "test.h5" { GROUP "/" { DATASET "data" { DATATYPE "/types/AStruct" DATASPACE SIMPLE { ( 1, 1 ) / ( 1, 1 ) } DATA { (0,0): { 1, 2 } } } GROUP "types" { DATATYPE "AStruct" H5T_COMPOUND { H5T_STD_I64LE "x"; H5T_IEEE_F64LE "y"; } } } } ```

I also opened an issue here: https://github.com/silx-kit/h5web/issues/1699

axelboc commented 3 months ago

We will definitely strive to support JLD2 files. Thanks for opening an issue in the H5Web repo. As explained, the problem comes from h5wasm not currently supporting committed datatypes. I've opened an issue on the h5wasm repo too: https://github.com/usnistgov/h5wasm/issues/80

loichuder commented 2 months ago

Thanks to @bmaranville and @axelboc work, we made good progress towards this. I could read this simple JLD2 file without issue using the main branch of H5Web: example.zip

Generation script
# Inspired by https://github.com/JuliaIO/JLD2.jl?tab=readme-ov-file#jld2
using JLD2

jldsave("example.jld2"; x, y, z)
jldopen("example.jld2", "r+"; compress = true) do f
       f["large_array"] = zeros(10000)
end

jldopen("example.jld2", "r+") do file
       mygroup = JLD2.Group(file, "mygroup")
       mygroup["mystuff"] = 42
end
JonasIsensee commented 2 months ago

I can confirm! Though, your example file here does not actually rely on the fix given that it does not use compound datatypes. The fact that the following file now works is quite impressive:

julia> using JLD2

julia> struct InnerStruct
           x::String
           y::Int
       end

julia> struct OuterStruct
           a::Int
           b::InnerStruct
           c::NTuple{3,Int}
       end

julia> jldsave("nested_compound.jld2"; data=OuterStruct(1, InnerStruct("two",3),(4,5,6)))

nested_compound.zip

loichuder commented 2 months ago

Thanks for double-checking :slightly_smiling_face:

I figured that the file was too simple so I hope to get a more complex example from people there: https://github.com/julia-vscode/julia-vscode/issues/2863#issuecomment-2340043098

JonasIsensee commented 2 months ago

Here are two more files which are conceptually interesting: JLD2 uses h5 references to refer to mutable fields inside structs and tracks object identities while saving and loading. This allows storing and loading recursive structures as well as obj identity preservation. In the second example, the field array is encoded and loaded only once. Both struct fields refer to the same memory after loading.

H5Web currently gives you no way to view / de-reference linked datasets in this way. That avoids all the potential pitfalls of circular references but also prevents viewing some of the data encoded in JLD2. Importantly though, it does not error even with files like this.

In my eyes, this is good enough for a release at this stage. Future feature ideas might be enhanced pretty printing for compound types and enabling the manual loading of referenced datasets.

julia> using JLD2

julia> mutable struct RecursiveStruct
           x::Float64
           y::RecursiveStruct
           RecursiveStruct(x)=new(x)
           RecursiveStruct(x,y)=new(x,y)
       end

julia> jldsave("recursive.jld2"; r=^C

julia> r = RecursiveStruct(1)
RecursiveStruct(1.0, #undef)

julia> r2 = RecursiveStruct(2, r)
RecursiveStruct(2.0, RecursiveStruct(1.0, #undef))

julia> r.y = r2
RecursiveStruct(2.0, RecursiveStruct(1.0, RecursiveStruct(#= circular reference @-2 =#)))

julia> jldsave("recursive.jld2"; r)

julia> load("recursive.jld2")
Dict{String, Any} with 1 entry:
  "r" => RecursiveStruct(1.0, RecursiveStruct(2.0, RecursiveStruct(#= circular … =#)))

julia> struct ObjIDPreservation
           arr1::Vector{Int}
           arr2::Vector{Int}
       end

julia> arr = [1,2,3,4,5,6]
6-element Vector{Int64}:
 1
 2
 3
 4
 5
 6

julia> obj = ObjIDPreservation(arr, arr)
ObjIDPreservation([1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6])

julia> obj.arr1 === obj.arr2 # references the same memory
true

julia> jldsave("objidpreservation.jld2"; obj)

julia> data = load("objidpreservation.jld2", "obj")
ObjIDPreservation([1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6])

julia> data.arr1 === data.arr2
true

objidpreservation.zip recursive.zip

hz-xiaxz commented 2 months ago

Sorry I don't really know how to use the main branch version of h5web, but I think tests above is good for most jld2 file case I will use

axelboc commented 2 months ago

47 improves support for committed datatypes and #48 allows JLD2 files to open directly in H5Web. This should be sufficient to bring basic support for most JLD2 files and resolve this issue.

I've started a discussion thread in the H5Web repo with improvement ideas mentioned in this issue so as to not lose track of them. Feel free to continue the discussion and/or create proper feature requests for these ideas over there.