mannau / h5

Interface to the HDF5 Library
Other
70 stars 22 forks source link

NA support for reals? #36

Closed jae0 closed 8 years ago

jae0 commented 8 years ago

NaN and Inf work so I can use them as workarounds, but wondering if NA in Reals can be supported? Thanks very much!

mannau commented 8 years ago

Could you explain the difference between NaN and 'NA in Reals'

jae0 commented 8 years ago

I should have given a reproducible example. So I tried and realized that you actually do support NA's. What I did initially was:

require(h5)
F = h5file("test.h5", mode='a')
F["mat"] = matrix(NA, ncol=2, nrow=3)
F["mat"][1,2]  = ‎2.0

Which gives an error because F["mat"] is a logical and I was trying to change the type of element 1,2 to a real. I was hoping F["mat"] to be promoted to a real as R would but no luck. ( perhaps type is fixed upon initialization in hdf5) ? :So I guessed we just had to initialize as a real and then reset the parts to NA:

G = matrix( runif(6), ncol=2) 
F["mat2"] ‎= G

But, 

F["mat2"]‎[1,2] = NA‎ 

Throws an error. So no go.Then I tried: 

G = matrix( runif(6), ncol=2) 
G[1,2]‎ =NA
F["mat3"] = G 

Works! But then ‎:

F["mat3"] [1,2] = 5.0

Throws another error. So this is strange and tried NaN's.‎ If I use NaN, these issues go away. However, NaN's  were to be used to id undefined values rather than missing values? Eg: http://stackoverflow.com/questions/15496361/what-is-the-difference-between-nan-and-inf-and-null-and-na-in-r‎ So, I do not know if NA's can be easily defined for real64 and int types. But if you would consider it, or specify a work-around when working with h5, that would be appreciated! Thanks!

jae0 commented 8 years ago

Oops I noticed a typo in my message. The sequence: 

G = matrix( runif(6), ncol=2)
G[1,2]‎ =NA
F["mat3"] = G
F["mat3"] [1,2] = 5.0

Works but the error is when I follow up with: F["mat3"] [1,2] = NA ‎So it can set an NA to a real here for F["mat3"]  but not  F["mat"] ? Also setting a real to NA does not work in all cases ‎, though using G with an NA works?

mannau commented 8 years ago

I think there is some confusion about NA values which are specified for each datatype separately, see also https://cran.r-project.org/doc/manuals/r-release/R-lang.html#NA-handling.

In the sequence

require(h5)
F = h5file("test.h5", mode='a')
F["mat"] = matrix(NA, ncol=2, nrow=3)
F["mat"][1,2]  = ‎2.0

matrix(NA, ncol=2, nrow=3) produces a logical matrix since NA is a logical value per default.

> typeof(matrix(NA, ncol=2, nrow=3))
[1] "logical"

By contrast, matrix( runif(6), ncol=2) gives a numeric matrix.

Since h5 infers the type of each dataset from the object it is important that the initial dataset has the right type. Also, it does not automatically convert datatypes when replacing values which requires a stricter data type handling.

For the first example, you could therefore use

library(h5)
F = h5file("test.h5", mode='a')
F["mat"] = matrix(NA_real_, ncol=2, nrow=3)
F["mat"][1,2]  = ‎2.0

The second example, which is

G = matrix( runif(6), ncol=2)
G[1,2]‎ =NA
F["mat3"] = G
F["mat3"] [1,2] = 5.0
F["mat3"] [1,2] = NA_real_

also works and gives

> F["mat3"][]
          [,1]      [,2]
[1,] 0.2623931        NA
[2,] 0.5415267 0.3516463
[3,] 0.8929198 0.2033035

because we are now using a numeric NA_real_ value.

jae0 commented 8 years ago
                                                                                  Ah. Indeed, I was not aware of these NA_‎* variants. Thank you for the clarification. All works well. Thanks for an excellent package.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         From: Mario AnnauSent: Sunday, September 11, 2016 5:14 PMTo: mannau/h5Reply To: mannau/h5Cc: Jae S. Choi; AuthorSubject: Re: [mannau/h5] NA support for reals? (#36)I think there is some confusion about NA values which are specified for each datatype separately, see also https://cran.r-project.org/doc/manuals/r-release/R-lang.html#NA-handling.

In the sequence

require(h5) F = h5file("test.h5", mode='a') F["mat"] = matrix(NA, ncol=2, nrow=3) F["mat"][1,2] = ‎2.0

matrix(NA, ncol=2, nrow=3) produces a logical matrix since NA is a logical value per default.

typeof(matrix(NA, ncol=2, nrow=3)) [1] "logical"

By contrast, matrix( runif(6), ncol=2) gives a numeric matrix.

Since h5 infers the type of each dataset from the object it is important that the initial dataset has the right type. Also, it does not automatically convert datatypes when replacing values which requires a stricter data type handling.

For the first example, you could therefore use

library(h5) F = h5file("test.h5", mode='a') F["mat"] = matrix(NAreal, ncol=2, nrow=3) F["mat"][1,2] = ‎2.0

The second full example, which is

G = matrix( runif(6), ncol=2) G[1,2]‎ =NA F["mat3"] = G F["mat3"] [1,2] = 5.0 F["mat3"] [1,2] = NAreal

should also work, because we are using now a numeric NAreal value.

—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/mannau/h5","title":"mannau/h5","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/mannau/h5"}},"updates":{"snippets":[{"icon":"PERSON","message":"@mannau in #36: I think there is some confusion about NA values which are specified for each datatype separately, see also https://cran.r-project.org/doc/manuals/r-release/R-lang.html#NA-handling.\r\n\r\nIn the sequence\r\n\r\npython\r\nrequire(h5)\r\nF = h5file(\"test.h5\", mode='a')\r\nF[\"mat\"] = matrix(NA, ncol=2, nrow=3)\r\nF[\"mat\"][1,2] = ‎2.0\r\n\r\nmatrix(NA, ncol=2, nrow=3) produces a logical matrix since NA is a logical value per default. \r\n\r\npython\r\n\u003e typeof(matrix(NA, ncol=2, nrow=3))\r\n[1] \"logical\"\r\n\r\n\r\nBy contrast, matrix( runif(6), ncol=2) gives a numeric matrix.\r\n\r\nSince h5 infers the type of each dataset from the object it is important that the initial dataset has the right type. Also, it does not automatically convert datatypes when replacing values which requires a stricter data type handling.\r\n\r\nFor the first example, you could therefore use\r\n\r\npython\r\nlibrary(h5)\r\nF = h5file(\"test.h5\", mode='a')\r\nF[\"mat\"] = matrix(NA_real_, ncol=2, nrow=3)\r\nF[\"mat\"][1,2] = ‎2.0\r\n\r\n\r\nThe second full example, which is\r\npython\r\nG = matrix( runif(6), ncol=2)\r\nG[1,2]‎ =NA\r\nF[\"mat3\"] = G\r\nF[\"mat3\"] [1,2] = 5.0\r\nF[\"mat3\"] [1,2] = NA_real_\r\n\r\nshould also work, because we are using now a numeric NAreal value.\r\n"}],"action":{"name":"View Issue","url":"https://github.com/mannau/h5/issues/36#issuecomment-246201701"}}}