Open alimanfoo opened 5 years ago
Note that the scikit-allel v1.x implementation follows the scikit-learn approach internally, implementing classes with fit()
, transform()
and fit_transform()
methods. Here perhaps an initial implementation could drop that and just effectively implement the fitting and transforming within the main pca()
function. I.e., have a signature like:
def pca(x):
"""
Perform PCA.
Parameters
----------
x : array_like, 2 dimensional
Returns
-------
coords
loadings
explained_variance_ratio
"""
In particular, it's not immediately obvious how the separate fit()
and transform()
steps would work with dispatching to multiple backends. Although possibly we could have dispatch functions for each, i.e., dispatch_pca_fit_transform()
, dispatch_pca_fit()
, dispatch_pca_transform()
.
Here's a gist with the transposition worked out so we don't have to transpose the input array (as scikit-allel version 1 does).
Proposed to add principal components analysis functions.
Implementation plan
skallel_stats.decomposition
package.skallel_stats.decomposition.api
module.pca()
public API function.randomized_pca()
public API function.dispatch_pca
anddispatch_randomized_pca
.numpy_backend
.dask_backend
.Notes
This is a porting and refactoring of functionality from scikit-allel version 1.x. See pca() and randomized_pca().
N.B., here it is proposed not to include the scaling preprocessing operation within the PCA implementation. Rather we leave that as a separate function (xref #9) which the user has to call themselves. E.g.: