Open fkiraly opened 4 months ago
I am unsure if I understand it correctly. Should this be a transformer that calibrates the quantiles? E.g. like: https://scikit-learn.org/stable/modules/calibration.html
- quantile calibration -
fit(X=y_proba, y_true)
, wherey_proba
are proba predictions, andy_true
is a calibration sample. Bothy_proba
andy_true
have 2D shape (N, d).
Why should have y_true
the same shape as y_proba
? I would assume that y_proba
needs to have an additional dimension since it contains predicted quantiles while y_true
are the actual values.
- model estimation, distribution fitting -
fit(X).transform(X)
produces a distribution of same shape asX
. IfX
is assumed i.i.d. sample, the distribution estimated is scalar, or same shape as a row ofX
. Question is what the output should be, even if the "genuine" estimate is a scalar or row distribution. Perhaps a hybrid interface withestimate
- can be row, scalar - andtransform
- always array - can be helpful here.- distribution smoothing or simplification - e.g., fit a closeby semi-paramtric or parametric distribution to another distribution. For instance, replace an
Empirical
by aQPD
.
regarding these both bullet points, I assume that we need to discuss this in a meeting. I am not sure if I understand this correctly.
In general, I think such a transformers would be very useful.
I am unsure if I understand it correctly. Should this be a transformer that calibrates the quantiles?
Exactly!
I am unsure if I understand it correctly. Should this be a transformer that calibrates the quantiles?
I am considering the distribution objects, as inheriting from BaseDistribution
- the parameterization by quantiles is dealt with, e.g., via the iterable alpha
argument of the quantile
method, so at the object level, you only have two relevant dimensions.
The quantile
returns will indeed have three dimensions, one more than y_true
.
regarding these both bullet points, I assume that we need to discuss this in a meeting. I am not sure if I understand this correctly.
Sure - one of the dev meetings? It is probably not clear in this brevity, possibly I need to write an API design proposal.
Discussion with @benHeid on probability calibration indicates that we may like to have another special category of transformations: distribution-to-distribution, possibly with a secondary input being samples.
Examples:
fit(X=y_proba, y_true)
, wherey_proba
are proba predictions, andy_true
is a calibration sample. Bothy_proba
andy_true
have 2D shape (N, d).fit(X).transform(X)
produces a distribution of same shape asX
. IfX
is assumed i.i.d. sample, the distribution estimated is scalar, or same shape as a row ofX
. Question is what the output should be, even if the "genuine" estimate is a scalar or row distribution. Perhaps a hybrid interface withestimate
- can be row, scalar - andtransform
- always array - can be helpful here.Empirical
by aQPD
.