mlr-org / mlr3pipelines

Dataflow Programming for Machine Learning in R
https://mlr3pipelines.mlr-org.com/
GNU Lesser General Public License v3.0
137 stars 25 forks source link

PipeOpUMAP #791

Open advieser opened 1 month ago

advieser commented 1 month ago

This implements Uniform Manifold Approximation and Projection (UMAP) from the uwot package.

Training works via uwot::umap2() and prediction through uwot::umap_transform().

closes https://github.com/mlr-org/mlr3pipelines/issues/755

advieser commented 1 month ago

Noting for future reference, that R CMD check failed since %check||%/%check&&% could not be found when they were used inside of a custom_check using crate(), i.e. custom_check = crate(function(x) check_...(x) %check||% check_...(x)).

mb706 commented 1 month ago

Apparently there was a bug in crate(), which is fixed in https://github.com/mlr-org/mlr3misc/pull/114 now. Until this is on cran, the necessary workaround is to specify the .parent argument of crate() directly. Could you see if using crate(...., .parent = topenv()) works? The reason we use crate() here is that we don't want the check functions to carry around the environment of the initialize()-call.

advieser commented 1 month ago

A couple of parameters (approx_pow, target_n_neighbors, target_weight, pca_method) are only used if another parameter is NULL or non-NULL. The way I see it, this can't be tested for using the depends argument for the Domain, since if I use depends = quote(approx_pow == NULL), I get

Error in constructor(value) : 
  Assertion on 'rhs' failed: Must have length 1, but has length 0.

Obviously, is.null(qpprox_pow) can't be parsed either. Did I get that correct?

There are other cases that can't be tested for using depends, however, this case with NULL comes up often enough, that I thought I'd ask.