grimbough / rhdf5

Package providing an interface between HDF5 and R
http://bioconductor.org/packages/rhdf5
61 stars 22 forks source link

Add native functionality #9

Closed dvantwisk closed 6 years ago

dvantwisk commented 6 years ago

The Bioconductor Core team have been working to add native functionality to rhdf5. What we mean by native functionality is to allow rhdf5 operations to transpose the data to be stored in such a way that it reflects R's column major ordering.

For example, if we were to write the matrix m <- matrix(1:6, nr=3) using h5write(m, h5File, "foo") and then write again using the new native argument with h5write(m, h5file, "bar", native=TRUE) the data will be stored in the hdf5 file as:

HDF5 "filecd95e9317de" {
GROUP "/" {
   DATASET "bar" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 3, 2 ) / ( 3, 2 ) }
      DATA {
      (0,0): 1, 4,
      (1,0): 2, 5,
      (2,0): 3, 6
      }
   }
   DATASET "foo" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 2, 3 ) / ( 2, 3 ) }
      DATA {
      (0,0): 1, 2, 3,
      (1,0): 4, 5, 6
      }
   }
}
}

The native argument facilitates this functionality to multiple functions.

Please look over the changes and get back to us with any questions, concerns, or places for improvement.

codecov[bot] commented 6 years ago

Codecov Report

Merging #9 into master will increase coverage by 0.98%. The diff coverage is 93.33%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master      #9      +/-   ##
=========================================
+ Coverage   82.72%   83.7%   +0.98%     
=========================================
  Files          28      28              
  Lines        1638    1663      +25     
=========================================
+ Hits         1355    1392      +37     
+ Misses        283     271      -12
Impacted Files Coverage Δ
R/h5ls.R 100% <100%> (ø) :arrow_up:
R/H5A.R 86.07% <100%> (ø) :arrow_up:
R/H5O.R 61.9% <100%> (ø) :arrow_up:
R/h5create.R 88.5% <100%> (+0.71%) :arrow_up:
R/AllMethods.R 75.93% <100%> (+2.65%) :arrow_up:
R/h5delete.R 100% <100%> (ø) :arrow_up:
R/h5modifier.R 90% <100%> (ø) :arrow_up:
R/h5write.R 90% <100%> (+0.11%) :arrow_up:
R/H5P.R 63.86% <100%> (ø) :arrow_up:
R/Rhdf5.R 88.23% <100%> (ø) :arrow_up:
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e875e86...58a9f47. Read the comment docs.

PeteHaitch commented 6 years ago

A file written with native = TRUE should also be read with native = TRUE

Is there a programmatic way to test whether a file was written with native = TRUE?

Thanks!

dvantwisk commented 6 years ago

@PeteHaitch

The native argument is specified as a slot in the H5IdComponent class. In the example I gave, there is not a way of designating whether the entry was created with the native argument using h5write() outside of specifying it in the name argument (i.e. calling the dataset "bar_native").

There are ways of writing with an H5IdComponent that could accomplish this:

f <- tempfile()
h5f <- H5Fcreate(f, native=TRUE)
h5writeDataset(matrix(1:6, nr=3), f, "test")
h5d <- h5f&"test"
h5d[,]
H5Oclose(h5d)
H5Fclose(h5f)

All these operations will be performed "natively" since the native=TRUE assigned to the H5IdComponent h5f is given to all H5IdComponent derived from it and all functions that call it. In essence, as long as the h5f persists, we know that the operations performed on it will be native.

I suppose an H5A attribute that is given to the hdf5 file that specifies that it is native could do what you asked. But we'd like to consider something like that in a future pull request, if possible.

grimbough commented 6 years ago

Thank very much for submitting this. I'll take a look in the next few days and get back to you - but it looks like it'll be very useful.

dvantwisk commented 6 years ago

Hello,

Have you had the chance to look at our changes?

grimbough commented 6 years ago

Sorry for the delay, I'll look at this ASAP and get the changes in before the Bioc release