privefl / bigstatsr

R package for statistical tools with big matrices stored on disk.
https://privefl.github.io/bigstatsr/
179 stars 30 forks source link

Greater use of generics #156

Closed multimeric closed 2 years ago

multimeric commented 2 years ago

I note that you have used generics in your package where they already exist, for example overloading %*% and length, however I think it would be helpful if some of the big_ functions were replaced by generic versions. For example, it would be nice if we could cor(some_fbm) instead of needing big_cor(some_fbm). The same goes for big_crossprod and crossprod etc. This would reduce the cognitive burden involved in learning this package, and would also allow for some polymorphism where we can write a function that accepts either a true matrix or a FBM equally.

Since these other functions are not by default generic, it should be possible to use setGeneric (which is automatically called by setMethod) to convert them into generics, which you can then provide an FBM implementation for.

On a related note, it would be cool to see an implementation of apply on an FBM, so users who are used to applying functions over in-memory matricies wouldn't have to learn the big_apply approach, which works fairly differently.

privefl commented 2 years ago

Actually, I am not sure it would be easier to use like this. I remember having subset() in {bigsnpr}, and it was confusing for users, so I made a snp_subset() function instead.

If you want to have functions that work for both FBMs and standard R matrices, just do if (is.matrix(X)) X <- as_FBM(X) at the beginning.

multimeric commented 2 years ago

Why do you think that? The idea of dispatching on generic functions is very core to how R is expected to work. Same reason why people expect summary() or plot() to work on a lot of models.

privefl commented 2 years ago

I remember thinking about this 5y ago, my conclusion was it would not be worth it. crossprod() is not even a method, right? And has only two parameters x and y. I have extra parameters, and want the FBM to be always called X for consistency.

multimeric commented 2 years ago

Like I said, although most of these aren't generics/methods by default, that doesn't really prohibit you from making them into generics. If you need extra arguments, you can just give it the signature function(x, y, ...){} and pass down the ... into the method implementation.

privefl commented 2 years ago

Yes, but I don't want to have x and y as the first two parameter names. Thank you for your suggestion, but I think I've made up my mind about this a long time ago.

privefl commented 2 years ago

BTW crossprod, tcrossprod and %*% are already defined in https://github.com/privefl/bigstatsr/blob/master/R/mult-mat.R, but work only with FBMs of type double.