hhoeflin / hdf5r

Other
80 stars 23 forks source link

Writing attributes with length 0: `buf parameter can't be NULL` #208

Closed rcannood closed 2 months ago

rcannood commented 10 months ago

Hi @hhoeflin ! Thanks again for creating and maintaining this incredibly useful package.

I'm having a few issues reading and writing attributes with arrays of length zero. Unfortunately this is quite blocking for a project I'm working on.

Here is a zip file of the h5ad file used in the examples below: example.zip

This is similar though I think unrelated to #118.

Any help in this matter would be much appreciated! I'll look through the hdf5r code to see if there is anything I can do to solve the problem.

Reading an attribute containing an empty array

When I try to read an attribute containing an empty array, I get an error message:

library(hdf5r)

file <- H5File$new("inst/extdata/example.h5ad", "r")

## reading empty attributes

h5attr(file[["obs"]], "column-order")

#  [1] "Float"                   "FloatNA"                
#  [3] "Int"                     "IntNA"                  
#  [5] "Bool"                    "BoolNA"                 
#  [7] "n_genes_by_counts"       "log1p_n_genes_by_counts"
#  [9] "total_counts"            "log1p_total_counts"     
# [11] "leiden"    

h5attr(file[["uns/DataFrameEmpty"]], "column-order")

# Error in attr_obj$read() : HDF5-API Errors:
#     error #000: ../../src/H5A.c in H5Aread(): line 702: buf parameter can't be NULL
#         class: HDF5
#         major: Invalid arguments to routine
#         minor: Bad value

file$close_all()

In comparison, this does work in h5py:

import h5py

## reading empty attributes
with h5py.File("inst/extdata/example.h5ad", "r") as f:
  nonempty = f["obs"].attrs["column-order"]
  print(f"Non-empty attribute: {nonempty}")
  uns_column_order = f["uns/DataFrameEmpty"].attrs["column-order"]
  print(f"Empty attribute: {uns_column_order}")

# Non-empty attribute: ['Float' 'FloatNA' 'Int' 'IntNA' 'Bool' 'BoolNA' 'n_genes_by_counts'
#  'log1p_n_genes_by_counts' 'total_counts' 'log1p_total_counts' 'leiden']
# Empty attribute: []

Writing an attribute containing an empty array

I also can't write empty arrays as attributes:

library(hdf5r)

file <- H5File$new("test.h5ad", "w")

h5attr(file, "nonempty") <- c("a", "b", "c")

h5attr(file, "nonempty")
# [1] "a" "b" "c"

h5attr(file, "empty") <- character(0)
# Error in attr$write(robj) : HDF5-API Errors:
#     error #000: ../../src/H5A.c in H5Awrite(): line 657: buf parameter can't be NULL
#         class: HDF5
#         major: Invalid arguments to routine
#         minor: Bad value

In Python, this does work:


## writing empty attributes
with h5py.File("test.h5ad", "w") as f:
  f.attrs["nonempty"] = ["a", "b", "c"]
  f.attrs["empty"] = []

  print(f"Non-empty attribute: {f.attrs['nonempty']}")
  print(f"Empty attribute: {f.attrs['empty']}")

# Non-empty attribute: ['a' 'b' 'c']
# Empty attribute: []
rcannood commented 9 months ago

I created a PR (#209) which contains a unit test for this edge case.

rcannood commented 9 months ago

I think the issue occurs whenever there's a function from the HDF5 library that expects a buffer.

  if(XLENGTH(R_buf) == 0) {
    buf = NULL;
  }
  else {
    buf = ...
  }
  // ...
  ... = H5SomeFunction(..., buf, ...);

It appears this buffer shouldn't be NULL (otherwise the error H5SomeFunction(): buf parameter can't be NULL is thrown).

Question is, with what should each of these buf = NULL be replaced? I've had some success by replacing buf = NULL; with buf = R_alloc(1, 1); // allocate 1 byte, but that causes other problems down the road.

rcannood commented 9 months ago

I managed to solve the issue by replacing the XLENGTH(.) == 0 condition with TYPEOF(.) == NILSXP:

  const void* buf;
  if(TYPEOF(R_buf) == NILSXP) {
    buf = NULL;
  }
  else {
    buf = (void *) VOIDPTR(R_buf);
  }

I created PR #211 which allows the unit tests I added in #209 to pass. It should be noted that I only performed the substitution only where needed for allowing the tests in #209 to pass.

hhoeflin commented 9 months ago

sorry for the late reply. Will try to look at the fix over the Christmas break and thanks for the PR.

rcannood commented 9 months ago

No worries and thanks! Is there anything I can help out with? Would it help to schedule a call?

hhoeflin commented 9 months ago

just a quick update. I had an initial look at this, but I need to dig deeper. A purely technical issue is that the places where you made the fix all need to be moved. The reason is that this code is auto-generated, so the way the code is generated has to be adapted upstream. As that code is somewhat ... ugly, I need to do that adaptation on my end.

Stay tuned

rcannood commented 9 months ago

Thanks for looking into this!

this code is auto-generated

It's good to hear that the code is auto-generated. I couldn't imagine maintaining this code base across different versions of HDF5 library.

Out of curiosity, is the script that generates the code in a public repository?

hhoeflin commented 9 months ago

Fixed in #214.

You can look for the for the code generates the wrapper files under /inst/CWrappers... but beware, it is not pretty.

rcannood commented 9 months ago

Aha, that's what the patches are for! Makes sense :)

Thanks for fixing the issue!

hhoeflin commented 8 months ago

@rcannood : I had to revert the changes for now and reopen the issue. The hdf5r.Extra package is causing a reverse dependency error.

https://github.com/ycli1995/hdf5r.Extra/issues/1

The code for the fix is located in the branch

https://github.com/hhoeflin/hdf5r/tree/bug/fix_empty_attrs_again

rcannood commented 8 months ago

It seems @ycli1995 fixed https://github.com/ycli1995/hdf5r.Extra/issues/1 :relaxed:

Could we get this fix merged and released? :bow:

rcannood commented 7 months ago

Hi @hhoeflin!

I ran a revdepcheck on this branch:

> revdepcheck::revdep_check(timeout = as.difftime(600, units = "mins"), num_workers = 8)
── CHECK ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 13 packages ──
✔ readNSx 0.0.2                          ── E: 0     | W: 0     | N: 0          
✔ hdf5r.Extra 0.0.5                      ── E: 0     | W: 0     | N: 0          
✔ rblt 0.2.4.6                           ── E: 0     | W: 0     | N: 0          
✔ dynutils 1.0.11                        ── E: 0     | W: 0     | N: 0   

I didn't let it finish completely, but I can already confirm that this PR does not introduce new errors to hdf5r.Extra anymore.

Would you be able to merge the PR? :bow:

hhoeflin commented 7 months ago

Will merge soon. There is yet some more compiler warnings to fix but this should be merged and out with the next release

On Sat, Feb 10, 2024, 09:38 Robrecht Cannoodt @.***> wrote:

Hi @hhoeflin https://github.com/hhoeflin!

I ran a revdepcheck on this branch:

revdepcheck::revdep_check(timeout = as.difftime(600, units = "mins"), num_workers = 8) ── CHECK ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 13 packages ── ✔ readNSx 0.0.2 ── E: 0 | W: 0 | N: 0 ✔ hdf5r.Extra 0.0.5 ── E: 0 | W: 0 | N: 0 ✔ rblt 0.2.4.6 ── E: 0 | W: 0 | N: 0 ✔ dynutils 1.0.11 ── E: 0 | W: 0 | N: 0

I didn't let it finish completely, but I can already confirm that this PR does not introduce new errors to hdf5r.Extra anymore.

Would you be able to merge the PR? 🙇

— Reply to this email directly, view it on GitHub https://github.com/hhoeflin/hdf5r/issues/208#issuecomment-1936934354, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASGMYUIQXS2M7VVFPEJLO3YS4WYZAVCNFSM6AAAAABALGC7Z6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWHEZTIMZVGQ . You are receiving this because you were mentioned.Message ID: @.***>

rcannood commented 7 months ago

Awesome, thanks!

rcannood commented 3 months ago

Hi @hhoeflin !

I see that branch bug/fix_empty_attrs_again has not yet been merged.

Might I inquire about the status of this fix?

The described issue has been a blocking issue for months on end now, and it's saddening to see that a fix for it exists but is simply not being merged / released.

Happy to help out with whatever you would need help with!

Kind regards, Robrecht

rcannood commented 2 months ago

Hey @hhoeflin !

Thanks for releasing hdf5r 1.3.11! :bow: :bow: :bow:

I can confirm this has fixed my issue.