Open vahidbas opened 6 years ago
Perhaps you're right... but when is it a good idea to mean-impute the regression target??
@jnothman impute is only for illustration. I have a relatively high dimensional noisy target which has occasional missing values. I would like to project it to some low dimensional space before using it as the predictor's target. The projection is of course lossy but it helps a lot in the accuracy of the predictive model and resolves missing value issue. I have some custom class doing this transformation and its inverse.
I am quite concerned to do something like that. How do you ensure to compute a proper score between a missing target and an imputed target. It seems something wrong to be done, isn't it.
I suppose it's a way of doing semi-supervised learning...?
@jnothman a bit more relaxed that semi-supervised learning as the target might be partially missing for a sample while in semi-supervised the target is fully missing for a sample.
@glemaitre Missing values will be ignored in the computation of the score, what can go wrong? Example:
def r2_score_with_nan(y_true, y_pred):
numerator = np.nansum((y_true - y_pred) ** 2, axis=0, dtype=np.float64)
denominator = np.nansum((y_true - np.nanmean(y_true, axis=0)) ** 2, axis=0, dtype=np.float64)
return np.mean(1 - numerator/denominator)
y_pred = np.random.randn(10, 3)
y_true = y_pred + np.random.randn(10, 3) * 0.1
y_true[5, 1] = np.NaN
r2_score_with_nan(y_true, y_pred)
Okay, let's minimise validation.
PR welcome.
The problem is there at two places , where we need to change force_all_finite to false and the other place is at _function-transformer.py line 103 . Just checked the commit you mentioned it also corrects the first part , but even if the first part is corrected the issue will not be resolved as after it line 179 in _target.py will produced an error which can be traced back to the other part I have mentioned above .
If nobody is handling the issue may have a look at it..?
This is fixed in #11349, which requires a second review
Is this issue still open ?
Yes, it seems so :|
Hi, I am new here. is there someone working on this? ... I can give a try and take this one for the weekend?
Description
One potential use case for
TransformedTargetRegressor
is to get rid of missing values in the target. but currently initial check of the fit method doesn't allow such array.Steps/Code to Reproduce
Example:
This raises:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').