Did you find a bug in datatable, or maybe the bug found you?
Tell us what it is.
Hi, I found a bug when I'm trying to convert pandas.DataFrame to datatable.Frame().
Succeeded panda.DataFrame
age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome
0 23 self-employed single secondary no -921 no no telephone 2 jan 9 1 56 unknown success
1 26 entrepreneur married secondary no -1206 no no cellular 5 apr 16 9 56 unknown other
2 25 admin. single primary no -932 no no telephone 1 jun 14 5 1 unknown other
3 24 retired divorced secondary no -701 no no cellular 6 may 11 1 1 unknown failure
4 28 entrepreneur single primary no -932 yes yes telephone 5 jan 15 1 69 unknown other
5 29 self-employed single secondary no -701 yes yes cellular 10 may 7 2 2 unknown success
6 21 housemaid divorced primary no -679 yes yes telephone 3 aug 16 1 85 unknown other
7 27 services married secondary no -665 yes yes cellular 10 may 9 4 81 unknown success
8 29 admin. married primary no -710 no no telephone 2 nov 14 10 73 unknown success
9 26 technician divorced primary no -921 yes yes telephone 4 may 11 2 81 unknown success
10 28 admin. divorced primary no -701 no no telephone 6 dec 10 10 -1 unknown failure
11 20 housemaid divorced tertiary no -679 yes yes telephone 10 apr 9 4 81 unknown success
12 25 entrepreneur married primary no -710 no no cellular 4 dec 12 8 73 unknown other
13 20 housemaid married tertiary no -679 yes yes telephone 1 dec 12 10 85 unknown other
14 29 blue-collar married primary no -932 no no telephone 5 feb 7 4 64 unknown failure
Failed pandas.DataFrame
age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome
0 23.0 self-employed single secondary no -921.0 no no telephone 2.0 jan 9.0 1.0 56.0 unknown success
1 26.0 entrepreneur married secondary no -1206.0 no no cellular 5.0 apr 16.0 9.0 56.0 unknown other
2 25.0 admin. single primary no -932.0 no no telephone 1.0 jun 14.0 5.0 1.0 unknown other
3 24.0 retired divorced secondary no -701.0 no no cellular 6.0 may 11.0 1.0 1.0 unknown failure
4 28.0 entrepreneur single primary no -932.0 yes yes telephone 5.0 jan 15.0 1.0 69.0 unknown other
5 29.0 self-employed single secondary no -701.0 yes yes cellular 10.0 may 7.0 2.0 2.0 unknown success
6 21.0 housemaid divorced primary no -679.0 yes yes telephone 3.0 aug 16.0 1.0 85.0 unknown other
7 27.0 services married secondary no -665.0 yes yes cellular 10.0 may 9.0 4.0 81.0 unknown success
8 29.0 admin. married primary no -710.0 no no telephone 2.0 nov 14.0 10.0 73.0 unknown success
9 26.0 technician divorced primary no -921.0 yes yes telephone 4.0 may 11.0 2.0 81.0 unknown success
10 28.0 admin. divorced primary no -701.0 no no telephone 6.0 dec 10.0 10.0 -1.0 unknown failure
11 20.0 housemaid divorced tertiary no -679.0 yes yes telephone 10.0 apr 9.0 4.0 81.0 unknown success
12 25.0 entrepreneur married primary no -710.0 no no cellular 4.0 dec 12.0 8.0 73.0 unknown other
13 20.0 housemaid married tertiary no -679.0 yes yes telephone 1.0 dec 12.0 10.0 85.0 unknown other
14 29.0 blue-collar married primary no -932.0 no no telephone 5.0 feb 7.0 4.0 64.0 unknown failure
But for the second one, if I reduce the batch_size from 15 to 1, it can work!!!
Could you please help to solve it? Thanks so much!
How to reproduce the bug?
plaste the following to csv file:
"""csv
age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome
23.0,self-employed,single,secondary,no,-921.0,no,no,telephone,2.0,jan,9.0,1.0,56.0,unknown,success
26.0,entrepreneur,married,secondary,no,-1206.0,no,no,cellular,5.0,apr,16.0,9.0,56.0,unknown,other
"""
use dataframe = pandas.read_csv(${csv_path}) to load the csv file as pandas.DataFrame
then, execute table = datatable.Frame(dataframe), it will core here.
What was the expected behavior?
In case it is not obvious, please tell us what result should your code
produce.
I think it should generated a datatable.Frame() rather than core dump
Your environment?
Linux #40~20.04.1-Ubuntu SMP Tue Apr 11 02:49:52 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Tag the issue with [bug] or [segfault] (depending on whether it crashes
Python or not).
Thank you for contributing, and sorry for the inconvenience.
Hi, I found a bug when I'm trying to convert pandas.DataFrame to datatable.Frame().
Succeeded panda.DataFrame age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome 0 23 self-employed single secondary no -921 no no telephone 2 jan 9 1 56 unknown success 1 26 entrepreneur married secondary no -1206 no no cellular 5 apr 16 9 56 unknown other 2 25 admin. single primary no -932 no no telephone 1 jun 14 5 1 unknown other 3 24 retired divorced secondary no -701 no no cellular 6 may 11 1 1 unknown failure 4 28 entrepreneur single primary no -932 yes yes telephone 5 jan 15 1 69 unknown other 5 29 self-employed single secondary no -701 yes yes cellular 10 may 7 2 2 unknown success 6 21 housemaid divorced primary no -679 yes yes telephone 3 aug 16 1 85 unknown other 7 27 services married secondary no -665 yes yes cellular 10 may 9 4 81 unknown success 8 29 admin. married primary no -710 no no telephone 2 nov 14 10 73 unknown success 9 26 technician divorced primary no -921 yes yes telephone 4 may 11 2 81 unknown success 10 28 admin. divorced primary no -701 no no telephone 6 dec 10 10 -1 unknown failure 11 20 housemaid divorced tertiary no -679 yes yes telephone 10 apr 9 4 81 unknown success 12 25 entrepreneur married primary no -710 no no cellular 4 dec 12 8 73 unknown other 13 20 housemaid married tertiary no -679 yes yes telephone 1 dec 12 10 85 unknown other 14 29 blue-collar married primary no -932 no no telephone 5 feb 7 4 64 unknown failure
Failed pandas.DataFrame age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome 0 23.0 self-employed single secondary no -921.0 no no telephone 2.0 jan 9.0 1.0 56.0 unknown success 1 26.0 entrepreneur married secondary no -1206.0 no no cellular 5.0 apr 16.0 9.0 56.0 unknown other 2 25.0 admin. single primary no -932.0 no no telephone 1.0 jun 14.0 5.0 1.0 unknown other 3 24.0 retired divorced secondary no -701.0 no no cellular 6.0 may 11.0 1.0 1.0 unknown failure 4 28.0 entrepreneur single primary no -932.0 yes yes telephone 5.0 jan 15.0 1.0 69.0 unknown other 5 29.0 self-employed single secondary no -701.0 yes yes cellular 10.0 may 7.0 2.0 2.0 unknown success 6 21.0 housemaid divorced primary no -679.0 yes yes telephone 3.0 aug 16.0 1.0 85.0 unknown other 7 27.0 services married secondary no -665.0 yes yes cellular 10.0 may 9.0 4.0 81.0 unknown success 8 29.0 admin. married primary no -710.0 no no telephone 2.0 nov 14.0 10.0 73.0 unknown success 9 26.0 technician divorced primary no -921.0 yes yes telephone 4.0 may 11.0 2.0 81.0 unknown success 10 28.0 admin. divorced primary no -701.0 no no telephone 6.0 dec 10.0 10.0 -1.0 unknown failure 11 20.0 housemaid divorced tertiary no -679.0 yes yes telephone 10.0 apr 9.0 4.0 81.0 unknown success 12 25.0 entrepreneur married primary no -710.0 no no cellular 4.0 dec 12.0 8.0 73.0 unknown other 13 20.0 housemaid married tertiary no -679.0 yes yes telephone 1.0 dec 12.0 10.0 85.0 unknown other 14 29.0 blue-collar married primary no -932.0 no no telephone 5.0 feb 7.0 4.0 64.0 unknown failure
But for the second one, if I reduce the batch_size from 15 to 1, it can work!!!
Could you please help to solve it? Thanks so much!
plaste the following to csv file: """csv age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome 23.0,self-employed,single,secondary,no,-921.0,no,no,telephone,2.0,jan,9.0,1.0,56.0,unknown,success 26.0,entrepreneur,married,secondary,no,-1206.0,no,no,cellular,5.0,apr,16.0,9.0,56.0,unknown,other """
use
dataframe = pandas.read_csv(${csv_path})
to load the csv file as pandas.DataFramethen, execute
table = datatable.Frame(dataframe)
, it will core here.I think it should generated a datatable.Frame() rather than core dump
Your environment? Linux #40~20.04.1-Ubuntu SMP Tue Apr 11 02:49:52 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Tag the issue with
[bug]
or[segfault]
(depending on whether it crashes Python or not).Thank you for contributing, and sorry for the inconvenience.