Closed katosh closed 9 months ago
Attention: 27 lines
in your changes are missing coverage. Please review.
Comparison is base (
c274b7f
) 92.65% compared to head (924aa5a
) 97.47%. Report is 1 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@ManuSetty The refactoring is done. I made the chages I mentioned and I removed the argument method
(with options auto
, fixed
, and percent
) in favor of gp_type
that lets the user choose the sparsification method explicitly:
gp_type : str or GaussianProcessType
The type of sparcification used for the Gaussian Process
- 'full' None-sparse Gaussian Process
- 'full_nystroem' Sparse GP with Nyström rank reduction without landmarks,
which lowers the computational complexity.
- 'sparse_cholesky' Sparse GP using landmarks/inducing points,
typically employed to enable scalable GP models.
- 'sparse_nystroem' Sparse GP using landmarks or inducing points,
along with an improved Nyström rank reduction method that balances
accuracy with efficiency.
The value can be either a string matching one of the above options or an instance of
the `mellon.parameters.GaussianProcessType` Enum. If a partial match is found with the
Enum, a warning will be logged, and the closest match will be used.
Defaults to 'sparse_cholesky'.
This comes with an additional parameter validation making sure no contradictory parameters are specified.
Commit 27b7d6386835cbbb51719a2a357b41e2b249247f resolves a major ambiguity harmonizing the new uncertainty computation for the DensityEstimator
with the noise input handling of the FunctionEstimator
.
The core objective of this PR is to introduce uncertainty estimation into Mellon's primary results.
New Features
with_uncertainty
ParameterIntegrates a boolean parameter
with_uncertainty
across all estimators: DensityEstimator, TimeSensitiveDensityEstimator, FunctionEstimator, and DimensionalityEstimator. It modifies the fitted predictor, accessible via the.predict
property, to include the following methods:.covariance(X)
: Calculates the (co-)variance of the posterior Gaussian Process (GP).diag=True
, computing only the covariance matrix diagonal..mean_covariance(X)
: Computes the (co-)variance through the uncertainty of the mean function's GP posterior.optimizer='advi'
except for theFunctionEstimator
where input uncertainty is specified through thesigma
parameter.diag=True
, computing only the covariance matrix diagonal..uncertainty(X)
: Combines.covariance(X)
and.mean_covariance(X)
.diag=True
, computing only the covariance matrix diagonal.gp_type
ParameterIntroduces the
gp_type
parameter to all relevant estimators to explicitly specify the Gaussian Process (GP) sparsification strategy, replacing the previously usedmethod
argument (with options auto, fixed, and percent) that implicitly controlled sparsification. The available options forgp_type
include:This new parameter adds additional validation steps, ensuring that no contradictory parameters are specified. If inconsistencies are detected, a helpful reply guides the user on how to fix the issue. The value can be either a string matching one of the options above or an instance of the
mellon.parameters.GaussianProcessType
Enum. Partial matches log a warning, using the closest match. Defaults to 'sparse_cholesky'.Note: Nyström strategies are not applicable to the FunctionEstimator.
y_is_mean
ParameterAdds a boolean parameter
y_is_mean
to FunctionEstimator, affecting howy
values are interpreted:sigma
impacted conditional mean functions and predictions.sigma
only influenced prediction uncertainty.y_is_mean=True
,y
values are treated as a fixed mean;sigma
reflects only uncertainty. Ify_is_mean=False
,y
is considered a noisy measurement, potentially smoothing values at locationsx
.This change benefits DensityEstimator, TimeSensitiveDensityEstimator, and DimensionalityEstimator where function values are predicted for out-of-sample locations after mean GP computation.
check_rank
ParameterIntroduces the
check_rank
parameter to all relevant estimators. This boolean parameter explicitly controls whether the rank check is performed, specifically in thegp_type="sparse_cholesky"
case. The rank check assesses the chosen landmarks for adequate complexity by examining the approximate rank of the covariance matrix, issuing a warning if insufficient. Allowed values are:True
: Always perform the check.False
: Never perform the check.None
(Default): Perform the check only ifn_landmarks
is greater than or equal ton_samples
divided by 10.The default setting aims to bypass unnecessary computation when the number of landmarks is so abundant that insufficient complexity becomes improbable.
normalize
ParameterThe
normalize
parameter is applicable to both the.mean
method and.__call__
method within the mellon.Predictor class. When set toTrue
, these methods will subtractlog(number of observations)
from the value returned. This feature is particularly useful with the DensityEstimator, where normalization adjusts for the number of cells in the training sample, allowing for accurate density comparisons between datasets. This correction takes into account the effect of dataset size, ensuring that differences in total cell numbers are not unduly influential. By default, the parameter is set toFalse
, meaning that density differences due to variations in total cell number will remain uncorrected.normalize_per_time_point
ParameterThis parameter fine-tunes the
TimeSensitiveDensityEstimator
to handle variations in sampling bias across different time points, ensuring both continuity and differentiability in the resulting density estimation. Notably, it also allows to reflect the growth of a population even if the same number of cells were sampled from each time point.The normalization is realized by manipulating the nearest neighbor distances
nn_distances
to reflect the deviation from an expected cell count.bool
,list
,array-like
, ordict
.Options:
True
: Normalizes to emulate an even distribution of total cell count across all time points.False
: Retains raw cell counts at each time point for density estimation.Notes:
nn_distance
Precedence: Ifnn_distance
is supplied, this parameter will be bypassed, and the provided distances will be used directly.False
Enhancements
Lp
in the estimators for reuse, enhancing the speed of the predictive function computation in non-Nyström strategies.DimensionalityEstimator.predict
now returns a subclass of themellon.Predictor
class instead of a closure. Giving access to serialization and uncertainty computations.compute_L
functionChanges
.mean
that is an alias to.__call__
....ConditionalMean...
were renamed to...Conditional...
since they now also compute.covariance
and.mean_covariance
....conditional_mean...
toconditional
.d_method != "fractal"
. Additionally, usingnormalize=True
in the density predictor triggers a warning that one has to use the non defaultd_method = "fractal"
in theDensityEstimator
.