openml / openml-python

OpenML's Python API for a World of Data and More 💫
http://openml.github.io/openml-python/
Other
279 stars 143 forks source link

get_dataset(), "The kernel appears to have died. It will restart automatically" #1093

Open learsi1911 opened 3 years ago

learsi1911 commented 3 years ago

Description

Steps/Code to Reproduce

Expected Results

Actual Results

joaquinvanschoren commented 3 years ago

Hi, I'll move this to the openml-python issue tracker

I'm guessing you tried to download a large dataset? This is a known issue. The ARFF parser uses too much memory.

We have implemented parquet support, but this is not yet in the current release.

PGijsbers commented 3 years ago

We have implemented parquet support, but this is not yet in the current release.

Small correction, it should be available in the current release as soon as the production server sends valid information on where the parquet file is located.

learsi1911 commented 3 years ago

Thank you very much for your answer, do you know approximately how long is the time for this new version?

On Mon, Jun 14, 2021 at 1:39 PM PGijsbers @.***> wrote:

We have implemented parquet support, but this is not yet in the current release.

Small correction, it should be available in the new release as soon as the production server sends valid information on where the parquet file is located.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openml/openml-python/issues/1093#issuecomment-860617099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNP2MRNQUD7BB5INTXKJGLTSXTABANCNFSM46VBXS7Q .

PGijsbers commented 3 years ago

@learsi1911 Could you please provide the ID of the dataset you were trying to download? And could you share how much memory was available to the kernel? That information would allow us to test whether the issue is resolved when the parquet support is fully operational.

@prabhant Do you have an estimate on when the parquet files are available from the production server?

learsi1911 commented 3 years ago

@learsi1911 Could you please provide the ID of the dataset you were trying to download? And could you share how much memory was available to the kernel? That information would allow us to test whether the issue is resolved when the parquet support is fully operational.

@prabhant Do you have an estimate on when the parquet files are available from the production server?

Of course the ID is 547 As I said the problem is that the first time I used "get_dataset()" I have no problem but if I try again then I get the error.

prabhant commented 3 years ago

The production server with parquet support will be ready in a week or two.

mfeurer commented 3 years ago

Dataset 547 is not really large and shouldn't result in any issues. Could you please run the failing snippet from within ipython and paste the output?

learsi1911 commented 3 years ago

Yes, I have tried python directly in the windows console and it works, maybe it is something related to jupyter.

On Tue, Jun 15, 2021 at 8:41 AM Matthias Feurer @.***> wrote:

Dataset 547 is not really large and shouldn't result in any issues. Could you please run the failing snippet from within ipython and paste the output?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openml/openml-python/issues/1093#issuecomment-861221532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNP2MS7O5EPETWPQNRSZQLTS3YYXANCNFSM46VBXS7Q .

PGijsbers commented 3 years ago

The jupyter notebook kernels typically work with much less memory than a regular python process. But as mfeurer said, the dataset isn't large and should not lead to a kernel dying. It would be helpful if you could post the code that lead to the error and the full error output.

PGijsbers commented 1 year ago

If the problem still occurs, please re-open this issue but provide a code example that reproduces the error.

AstralVolt commented 2 weeks ago

I have the same kind of issue.

import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, confusion_matrix, classification_report import openml

from IPython.display import Image, display

credit_data = openml.datasets.getdataset(31) X, y, , _ = credit_data.get_data(target=credit_data.default_target_attribute)

The kernel dies and restarts after this. Alternatively I tried writing this in the last line aswell

X, y, , = credit_data.get_data(target=credit_data.default_target_attribute, dataset_format="dataframe")

Still with no luck. Its not fixing. Tried possibly everything including upgrading openml

PGijsbers commented 2 weeks ago

Could you let us know which OS you are using, as well as the python version and a list of installed dependencies? (python -V and python -m pip list)