renatopp / liac-arff

A library for read and write ARFF files in Python
MIT License
99 stars 49 forks source link

Bad @DATA instance format for UCI chronic kidney disease data #97

Closed yli110-stat697 closed 5 years ago

yli110-stat697 commented 5 years ago

Hi, I was trying to load the .arff file of the UCI chronic kidney diease into python using data = arff.loads(open('./Data/chronic_kidney_disease_full.arff')) But I got the 'BadDataFormat' exception as a result BadDataFormat: Bad @DATA instance format in line 215: 26,70,1.015,0,4,?,normal,notpresent,notpresent,250,20,1.1,?,?,15.6,52,6900,6.0,no,yes,no,good,no,no,ckd, Is there a way to fix this?

jnothman commented 5 years ago

It has a spurious trailing comma. It is, to my knowledge, invalid ARFF. Remove the trailing comma there and a few lines later.

yli110-stat697 commented 5 years ago

I see. So There's no way to solve in using liac-arff package? In this case, do I have to convert .arff into csv to remove the trailing comma?

jnothman commented 5 years ago

You don't need to convert into CSV, you need to edit with a text editor

yli110-stat697 commented 5 years ago

Got it! And it fixed my problem. The thing with this dataset is that it has so many bad instances when use liac-arff package. I just found that read_csv() can read .arff file too. Here's the link https://mclguide.readthedocs.io/en/latest/sklearn/preprocessing.html#chronic-kidney-disease

hamidialii1990 commented 1 year ago

Hi, I was trying to load the .arff file of the UCI chronic kidney diease into python using data = arff.loads(open('./Data/chronic_kidney_disease_full.arff')) But I got the 'BadDataFormat' exception as a result BadDataFormat: Bad @DATA instance format in line 215: 26,70,1.015,0,4,?,normal,notpresent,notpresent,250,20,1.1,?,?,15.6,52,6900,6.0,no,yes,no,good,no,no,ckd, Is there a way to fix this?

Can you find any solution to your problem? I get the same error also:

arff.BadDataFormat: Bad @DATA instance format in line 3096: free.fr,False,http://www.free.fr/adsl,2.0,5611,719,420.5199999999132,754.3740000000653,3627.99799

yli110-stat697 commented 1 year ago

@hamidialii1990 sorry this was a while ago. Try the solution above, I think I fixed mine