Closed torivor closed 1 year ago
Hi @torivor,
Can't say without looking at the dataset, but have a look at my answer in #65 then the thread in #68
Won't go into the details as it's already described in those threads, but in short this error can occur if columns are all missing the same value even if each column itself is not completely null.
Right now this is expected behavior given the algorithm design, albeit the error is not handled well. In #68 you'll see a more optimal solution - one in which we use placeholders vs. listwise delete. This is the right way to go, just haven't written the code for it yet.
In the meantime i'd play with your featureset, maybe reducing the column space or sampling more rows if possible.
Hello Kearney,
I beg your pardon for the late reply. The dataset that I mentioned before in the Github issue can be found on the following link: https://1drv.ms/u/s!AuNCY1udObSUgeVoN1EVZLGPvMGLrg?e=z4XdYb
Should you have any alternative solutions regarding the issue, please let me know. Thank you for the response!
Sincerely, Andreas Parasian
On Wed, Aug 3, 2022 at 1:38 PM Joe Kearney @.***> wrote:
Hi @torivor https://github.com/torivor,
Can't say without looking at the dataset, but have a look at my answer in
65 https://github.com/kearnz/autoimpute/issues/65 then the thread in
68 https://github.com/kearnz/autoimpute/issues/68
Won't go into the details as it's already described in those threads, but in short this error can occur if columns are all missing the same value even if each column itself is not completely null.
Right now this is expected behavior given the algorithm design, albeit the error is not handled well. In #68 https://github.com/kearnz/autoimpute/issues/68 you'll see a more optimal solution - one in which we use placeholders vs. listwise delete. This is the right way to go, just haven't written the code for it yet.
In the meantime i'd play with your featureset, maybe reducing the column space or sampling more rows if possible.
— Reply to this email directly, view it on GitHub https://github.com/kearnz/autoimpute/issues/79#issuecomment-1203544860, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOHMQXEVJ3TKEK6ZEHGYFNLVXIHWZANCNFSM55JZJ6VQ . You are receiving this because you were mentioned.Message ID: @.***>
Correction, please use this link to download the CSV file: https://onedrive.live.com/download?resid=94B4399D5B6342E3!29416&authkey=!ADdRFWSxj7zBi64
The link I sent only shows a preview of the file without any option to download (since its larger than 25 megabytes).
On Thu, Aug 4, 2022 at 12:34 AM Andreas Parasian @.***> wrote:
Hello Kearney,
I beg your pardon for the late reply. The dataset that I mentioned before in the Github issue can be found on the following link: https://1drv.ms/u/s!AuNCY1udObSUgeVoN1EVZLGPvMGLrg?e=z4XdYb
Should you have any alternative solutions regarding the issue, please let me know. Thank you for the response!
Sincerely, Andreas Parasian
On Wed, Aug 3, 2022 at 1:38 PM Joe Kearney @.***> wrote:
Hi @torivor https://github.com/torivor,
Can't say without looking at the dataset, but have a look at my answer in
65 https://github.com/kearnz/autoimpute/issues/65 then the thread in
68 https://github.com/kearnz/autoimpute/issues/68
Won't go into the details as it's already described in those threads, but in short this error can occur if columns are all missing the same value even if each column itself is not completely null.
Right now this is expected behavior given the algorithm design, albeit the error is not handled well. In #68 https://github.com/kearnz/autoimpute/issues/68 you'll see a more optimal solution - one in which we use placeholders vs. listwise delete. This is the right way to go, just haven't written the code for it yet.
In the meantime i'd play with your featureset, maybe reducing the column space or sampling more rows if possible.
— Reply to this email directly, view it on GitHub https://github.com/kearnz/autoimpute/issues/79#issuecomment-1203544860, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOHMQXEVJ3TKEK6ZEHGYFNLVXIHWZANCNFSM55JZJ6VQ . You are receiving this because you were mentioned.Message ID: @.***>
Apologies for the late reply. Looked into it more and this is because of the issues I linked to. Right now autoimpute
uses listwise_delete
instead of placeholders. So if you use a lot of features, there must be at least one row where all those features (expect the imputed column) have a datum present. The chance of this being True gets smaller as the number of features expands.
Again, see #65 and #68. The recommended solution for now is to experiment with your data. Try using fewer columns for the imputation as a start, instead of all the features. I'm planning to work on moving from complete case to mean placeholders at some point soon but don't have a TBD on that yet.
Hello, this package has been a lifesaver for my imputation needs. However, recently I encountered this error.. "ValueError: Found array with 0 sample(s) (shape=(0, 174)) while a minimum of 1 is required." that I can't seem to solve. I encountered the error while trying to fit_transform my Pandas DataFrame. All column of the DataFrame has some values, none of them are completely filled with missing values yet the package still throws this error. Hence, I don't know what triggered the error. I assume this is due to a coding fault within the package.
I can send the data as it's publicly available from a Kaggle competition, but the size is too big. Should anyone require the data, I can email it, just send me a request at andreasparasian@gmail.com.