kaneplusplus / bigmemory

126 stars 24 forks source link

writing integer matrices does too much work #84

Closed phaverty closed 6 years ago

phaverty commented 6 years ago

Writing integer matrices in SetElements.bm calls na.omit and as.integer on the whole matrix twice:

 if (typeof(x) != "double") {
     integerVals <- na.omit(as.integer(value))
     if (sum(integerVals == na.omit(as.integer(value))) !=
         length(integerVals) | is.factor(value)) {
         warning("non-integer (possibly Inf or -Inf) typecast to integer")
     }
 }

... }, SetMatrixElements(x@address, as.double(j), as.double(i), as.integer(value)))

For my 36528 by 28769 example, the integer version is 151 seconds vs. 57 seconds, which is kind of a bummer for a key use case.

How about just doing as.integer once and using sum(is.na(value)) rather than doing length(na.omit(value)) ?

I'd be happy to make a PR for this. Before I get going, is there a reason that na.omit was used?

phaverty commented 6 years ago

Actually, this code seems wrong for two reasons. The "|" should be "||" since we want the scalar or. Also, isn't "integerVals == na.omit(as.integer(value))" always all TRUE, by definition? What is this code supposed to do? I think we meant to compare

sum(is.na(value) and sum(is.na(as.integer(value)))

phaverty commented 6 years ago

Also, the as.integer(value) and as.double(value) calls are not cheap. Can we skip them for things that are already the right type, or do we need to drop the dimension attributes?

privefl commented 6 years ago

For the conversion, I've tried to do this which is similar to what you want to achieve I think: https://stackoverflow.com/questions/45677027/warning-when-downcasting-in-rcpp

phaverty commented 6 years ago

Excellent, thanks so much! Yes, this is a job for C so we can avoid allocating all those intermediate arrays.

Pete


Peter M. Haverty, Ph.D. Genentech, Inc. phaverty@gene.com

On Fri, Jun 29, 2018 at 11:55 PM, Florian Privé notifications@github.com wrote:

For the conversion, I've tried to do this which is similar to what you want to achieve I think: https://stackoverflow.com/ questions/45677027/warning-when-downcasting-in-rcpp

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kaneplusplus/bigmemory/issues/84#issuecomment-401522279, or mute the thread https://github.com/notifications/unsubscribe-auth/AH02K4sCnUj5xquatkWWbDOwWynSaVlrks5uByDzgaJpZM4U9oQB .

kaneplusplus commented 6 years ago

Thanks fellas.