[r] [Bug] Writing metadata of type `bit64::integer64()` yields in corrupted data #1512

Closed mojaveazure closed 1 year ago

mojaveazure commented 1 year ago

Describe the bug Writing array-level metadata where the type is bit64::integer64() yeilds corrupted data. This doesn't happen with regular integers or doubles

To Reproduce

  # Create array
  array <- SOMASparseNDArrayCreate("array", type = arrow::int64(), shape = c(80, 230))
  # Write metadata
    integers = c(10L, 20L),
    doubles = c(10.0, 20.0),
    int64 = bit64::as.integer64(c(10L, 20L))
  # Close to finalize writes
  # Re-open and read in metadata
  array <- SOMASparseNDArrayOpen(array$uri)
#> [1] 10 20
#> [1] 10 20
#> [1] 4.940656e-323 9.881313e-323

Versions (please complete the following information):

> sessionInfo(c('tiledbsoma', 'tiledb', 'bit64'))$otherPkgs
Additional context Note, this does not happen in Python

Python example ```python import os import tempfile import numpy import pyarrow import tiledbsoma with tempfile.TemporaryDirectory() as tmpdir: # Create array array = tiledbsoma.SparseNDArray.create( os.path.join(tmpdir, "array"), type=pyarrow.int64(), shape=[80, 230] ) # Write metadata array.metadata["integers"] = numpy.array([10, 20], dtype=numpy.int32) array.metadata["doubles"] = numpy.array([10.0, 20.0]) array.metadata["int64"] = numpy.array([10, 20], dtype=numpy.int64) # Close to finalize writes array.close() # Re-open and read in metadata array = print(array.metadata["integers"]) print(array.metadata["doubles"]) print(array.metadata["int64"]) ```
eddelbuettel commented 1 year ago

Thanks for reporting this, and I can confirm. It is actually a tiledb-r bug. When we write metadate from R via tiledb_put_metadata() we call the internal function libtiledb_array_put_metadata(). Which receives the array, the key under which we write, and the value. And a (casted) bit64::integer64() becomes a REALSXP as the payload is stored in a double and then a double is returned. So I will have to catch this when we write it and make sure I differentiate between a double itself and a double acting on behalf of an int64_t representation -- and in the latter case write that.

eddelbuettel commented 1 year ago

PR coming in a moment but we have this now:

$ Rscript soma_gh_issue512.R
[1] 10 20
[1] 10 20
[1] 10 20

with a minimally modified version of your most helpful example.

withcast <- function() {
    uri <- "soma_issue_1512"
    if (dir.exists(uri)) unlink(uri, recursive=TRUE)

    ## Create array
    array <- SOMASparseNDArrayCreate(uri, type = arrow::int64(), shape = c(80, 230))
    ## Write metadata
    array$set_metadata(list(integers = c(10L, 20L),
                            doubles = c(10.0, 20.0),
                            int64 = bit64::as.integer64(c(10L, 20L))))
    ## Close to finalize writes
    ## Re-open and read in metadata
    array <- SOMASparseNDArrayOpen(array$uri)

eddelbuettel commented 1 year ago

Addressed by over on the other repo, reviews welcome!

eddelbuettel commented 1 year ago

I think this can be closed, maybe in conjunction with a version depends on tiledb-r. It was never a bug in tiledbsoma as the metadata writes are handled by tiledb-r.