Closed jmmcd closed 11 years ago
This is true for the real-world test data sets on the ffx home page as well.
Thanks for the extra report. I've just had another look. I think it happens when we "unbias" the data, ie normalise to mean 0 and stddev 1. Dividing by the stddev gives NaN if stddev = 0. In that case the variable is actually constant, so it's ok to just replace it with a zero, I think. When rebiasing I'm not sure whether any change is needed.
Anyway, please try putting this code for _unbiasedXy. It fixes my test, above. Probably there's a more idiomatic way to express this in numpy.
def _unbiasedXy(self, Xin, yin):
"""Make all input rows of X, and y, to have mean=0 stddev=1 """
#unbiased X
X_avgs, X_stds = Xin.mean(0), Xin.std(0)
X_unbiased = (Xin - X_avgs) / X_stds
#check whether any stddevs were 0 -- if so, use (value - mean)
bad_rows = numpy.any(~numpy.isfinite(X_unbiased), 1)
for i, bad in enumerate(bad_rows):
if bad:
X_unbiased[i] = (Xin[i] - X_avgs[i])
#unbiased y
y_avg, y_std = yin.mean(0), yin.std(0)
y_unbiased = (yin - y_avg) / y_std
#check whether stddev was 0 -- if so, use (value - mean)
if numpy.any(~numpy.isfinite(y_unbiased)):
y_unbiased = yin - y_avg
assert numpy.all(numpy.isfinite(X_unbiased))
assert numpy.all(numpy.isfinite(y_unbiased))
return (X_unbiased, y_unbiased, X_avgs, X_stds, y_avg, y_std)
Fixed with 4556878031a8ad6b92f3be2b82ad0427b3c5370f
This file crashes:
Whereas this one works ok -- only difference is the data: