Closed apavlo89 closed 3 years ago
Hi @apavlo89
Can you please post the full error traceback you see when you try to execute this code? Hard to say right now from the code above, could be any number of things.
Thanks, Joe
runfile('C:/Users/apavl/Dropbox/A+T/REEG/miceforest.py', wdir='C:/Users/apavl/Dropbox/A+T/REEG')
WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain`
C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\theano\configdefaults.py:697: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
"DeprecationWarning: there is no c++ compiler."
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\sklearn\utils\validation.py:70: FutureWarning: Pass fit_intercept=True, normalize=False, copy_X=True, n_jobs=None as keyword args. From version 0.25 passing these as positional arguments will result in an error
FutureWarning)
Traceback (most recent call last):
File "C:\Users\apavl\Dropbox\A+T\REEG\miceforest.py", line 14, in <module>
imp.fit_transform(dataset)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\autoimpute\imputations\dataframe\multiple_imputer.py", line 231, in fit_transform
return self.fit(X, y).transform(X)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\autoimpute\utils\checks.py", line 61, in wrapper
return func(d, *args, **kwargs)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\autoimpute\utils\checks.py", line 126, in wrapper
return func(d, *args, **kwargs)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\autoimpute\utils\checks.py", line 173, in wrapper
return func(d, *args, **kwargs)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\autoimpute\imputations\dataframe\multiple_imputer.py", line 188, in fit
imputer.fit(X)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\autoimpute\utils\checks.py", line 61, in wrapper
return func(d, *args, **kwargs)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\autoimpute\utils\checks.py", line 126, in wrapper
return func(d, *args, **kwargs)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\autoimpute\utils\checks.py", line 173, in wrapper
return func(d, *args, **kwargs)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\autoimpute\imputations\dataframe\single_imputer.py", line 190, in fit
imputer.fit(x_, y_)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\autoimpute\imputations\series\default.py", line 395, in fit
super().fit(X, y)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\autoimpute\imputations\series\default.py", line 186, in fit
stats = {"param": self.num_imputer.fit(X, y),
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\autoimpute\imputations\series\pmm.py", line 117, in fit
y_pred = self.lm.fit(X, y).predict(X)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\sklearn\linear_model\_base.py", line 506, in fit
y_numeric=True, multi_output=True)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\sklearn\base.py", line 432, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
return f(**kwargs)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\sklearn\utils\validation.py", line 802, in check_X_y
estimator=estimator)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
return f(**kwargs)
File "C:\Users\apavl\anaconda3\envs\neuroscience\lib\site-packages\sklearn\utils\validation.py", line 653, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0, 1895)) while a minimum of 1 is required.
I'm uploading the database with some features removed so you can see for yourself. Thank you so much for your swift response!
@apavlo89
Bit of a tricky one to explain here - to understand, first try the same code but using the first four columns of your dataset, then five, then six, etc... you'll see that they should all work just fine.
So why doesn't the full dataset work? Well when you do fit_transform
with all default values, a fair amount happens under the hood. in your example, you don't specify a method, so the default method becomes pmm
. One aspect of pmm
includes fitting linear regressions on subsets of each column using some or every other column as predictors assuming the columns meet specific criteria.
now the error you see is actually an sklearn
error, and it's thrown when a linear regression's x
or y
inputs are malformed. While no specific column in your dataset is malformed on its own, some combination of columns, after filtering, are in fact malformed - hence the error. I'd bet it results from this line here: https://github.com/kearnz/autoimpute/blob/master/autoimpute/imputations/dataframe/single_imputer.py#L196
I'm not surprised you ran into this issue. You're using >1900 columns with <40 rows. Try refining your sample. Do you have more data? Can you use fewer columns? I'd doubt you'd need 1899 columns to predict the missingness in each other column, so you'll have to play around with the MiceImputer
inputs.
In the meantime, I can look into making this error more informative, but the error thrown is happening for a reason.
Let me know if that makes sense.
Yes, reducing the number of features makes it work, thank you. My problem now is how do I get the database with filled-in missing values with MICE?
When I use Multiple Imputer with mean it works as I get database variable with predicted values for missing values.
'''MCAR mean imputation'''
# create the mean imputer
mi_mean_mcar = MultipleImputer(
strategy="mean", n=5, return_list=True, seed=101
)
# print the mean imputer to console
print(mi_mean_mcar)
# perform mean imputation procedure
imp_mean_mcar = mi_mean_mcar.fit_transform(mcar)
But just filling missing values with the mean value is not what I am after. Anyone can do that easily with skilearn's SimpleImputer
Hi @apavlo89,
mean imputation is one of many strategies that autoimpute
offers. To understand the different strategies, I'd suggest reading the docs here
If you're new to imputation, it's best to leave the strategy blank. The MiceImputer
(or MultipleImputer
) will then pickup the default strategy. For numerical data, the default is pmm
, or predictive mean matching. It will be your best bet in most scenarios.
The original code you sent used pmm
and the MiceImputer
already, but the problem is that you were using too many features for each column you want to impute. This will never be a problem for mean imputation, as mean imputation is univariate and independent (i.e. each column depends on itself only and no other columns are used). That's why you can impute the full dataset with mean
, but you run into issues with pmm
.
closing this. if you have further questions feel free to let me know.
Hello everyone,
I'm trying to run autoimpute but from my understanding it is telling me that I have a column or a row that is empty??? I checked and that is not the case! Any idea what may be causing this? Here is the code I use below