kearnz / autoimpute

Python package for Imputation Methods
MIT License
237 stars 19 forks source link

ValueError: Found array with 0 sample(s) (shape=(0, 174)) while a minimum of 1 is required. #79

Closed torivor closed 1 year ago

torivor commented 1 year ago

Hello, this package has been a lifesaver for my imputation needs. However, recently I encountered this error.. "ValueError: Found array with 0 sample(s) (shape=(0, 174)) while a minimum of 1 is required." that I can't seem to solve. I encountered the error while trying to fit_transform my Pandas DataFrame. All column of the DataFrame has some values, none of them are completely filled with missing values yet the package still throws this error. Hence, I don't know what triggered the error. I assume this is due to a coding fault within the package.

I can send the data as it's publicly available from a Kaggle competition, but the size is too big. Should anyone require the data, I can email it, just send me a request at andreasparasian@gmail.com.

kearnz commented 1 year ago

Hi @torivor,

Can't say without looking at the dataset, but have a look at my answer in #65 then the thread in #68

Won't go into the details as it's already described in those threads, but in short this error can occur if columns are all missing the same value even if each column itself is not completely null.

Right now this is expected behavior given the algorithm design, albeit the error is not handled well. In #68 you'll see a more optimal solution - one in which we use placeholders vs. listwise delete. This is the right way to go, just haven't written the code for it yet.

In the meantime i'd play with your featureset, maybe reducing the column space or sampling more rows if possible.

torivor commented 1 year ago

Hello Kearney,

I beg your pardon for the late reply. The dataset that I mentioned before in the Github issue can be found on the following link: https://1drv.ms/u/s!AuNCY1udObSUgeVoN1EVZLGPvMGLrg?e=z4XdYb

Should you have any alternative solutions regarding the issue, please let me know. Thank you for the response!

Sincerely, Andreas Parasian

On Wed, Aug 3, 2022 at 1:38 PM Joe Kearney @.***> wrote:

Hi @torivor https://github.com/torivor,

Can't say without looking at the dataset, but have a look at my answer in

65 https://github.com/kearnz/autoimpute/issues/65 then the thread in

68 https://github.com/kearnz/autoimpute/issues/68

Won't go into the details as it's already described in those threads, but in short this error can occur if columns are all missing the same value even if each column itself is not completely null.

Right now this is expected behavior given the algorithm design, albeit the error is not handled well. In #68 https://github.com/kearnz/autoimpute/issues/68 you'll see a more optimal solution - one in which we use placeholders vs. listwise delete. This is the right way to go, just haven't written the code for it yet.

In the meantime i'd play with your featureset, maybe reducing the column space or sampling more rows if possible.

— Reply to this email directly, view it on GitHub https://github.com/kearnz/autoimpute/issues/79#issuecomment-1203544860, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOHMQXEVJ3TKEK6ZEHGYFNLVXIHWZANCNFSM55JZJ6VQ . You are receiving this because you were mentioned.Message ID: @.***>

torivor commented 1 year ago

Correction, please use this link to download the CSV file: https://onedrive.live.com/download?resid=94B4399D5B6342E3!29416&authkey=!ADdRFWSxj7zBi64

The link I sent only shows a preview of the file without any option to download (since its larger than 25 megabytes).

On Thu, Aug 4, 2022 at 12:34 AM Andreas Parasian @.***> wrote:

Hello Kearney,

I beg your pardon for the late reply. The dataset that I mentioned before in the Github issue can be found on the following link: https://1drv.ms/u/s!AuNCY1udObSUgeVoN1EVZLGPvMGLrg?e=z4XdYb

Should you have any alternative solutions regarding the issue, please let me know. Thank you for the response!

Sincerely, Andreas Parasian

On Wed, Aug 3, 2022 at 1:38 PM Joe Kearney @.***> wrote:

Hi @torivor https://github.com/torivor,

Can't say without looking at the dataset, but have a look at my answer in

65 https://github.com/kearnz/autoimpute/issues/65 then the thread in

68 https://github.com/kearnz/autoimpute/issues/68

Won't go into the details as it's already described in those threads, but in short this error can occur if columns are all missing the same value even if each column itself is not completely null.

Right now this is expected behavior given the algorithm design, albeit the error is not handled well. In #68 https://github.com/kearnz/autoimpute/issues/68 you'll see a more optimal solution - one in which we use placeholders vs. listwise delete. This is the right way to go, just haven't written the code for it yet.

In the meantime i'd play with your featureset, maybe reducing the column space or sampling more rows if possible.

— Reply to this email directly, view it on GitHub https://github.com/kearnz/autoimpute/issues/79#issuecomment-1203544860, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOHMQXEVJ3TKEK6ZEHGYFNLVXIHWZANCNFSM55JZJ6VQ . You are receiving this because you were mentioned.Message ID: @.***>

kearnz commented 1 year ago

Apologies for the late reply. Looked into it more and this is because of the issues I linked to. Right now autoimpute uses listwise_delete instead of placeholders. So if you use a lot of features, there must be at least one row where all those features (expect the imputed column) have a datum present. The chance of this being True gets smaller as the number of features expands.

Again, see #65 and #68. The recommended solution for now is to experiment with your data. Try using fewer columns for the imputation as a start, instead of all the features. I'm planning to work on moving from complete case to mean placeholders at some point soon but don't have a TBD on that yet.