rock999 / wekaonline

Automatically exported from code.google.com/p/wekaonline
0 stars 0 forks source link

How to know the dataset qualities? #3

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
We have to take the dataset from a person. Sometimes the execution of datasets 
in the experiments break. Because they have missing values or they are 
continuous in instance of enumerable. 

When we run an experiment what would be the out of the bad running?

Weka knows the qualities but it uses the API to know it. How will we catch 
this exception?

Original issue reported on code.google.com by illoqpa...@gmail.com on 4 Jun 2010 at 10:34

GoogleCodeExporter commented 9 years ago
The matter is also how we will save the information of the dataset. We have a 
class 
called dataset. The attributes are:
name,hash,path,continuous,nominal,missing.

The boolean attribute adds information about the dataset. But is there some way 
to take 
this information by command line of Weka?

Wich attributes do we need more?

Original comment by illoqpa...@gmail.com on 4 Jun 2010 at 10:47

Attachments:

GoogleCodeExporter commented 9 years ago
"Weka knows the qualities but it uses the API to know it. How will we catch 
this exception?"

Well, we can expect most client datasets to be valid arff/csv. But obviously 
that
can't be trusted so we need a validator so that gogrid server don't get started 
in
vain. In my old wonline code, dataset after uploading was 'parsed' so that it 
must
contain certain data (@data, @attributes etc.) but the quicker and more 
reliable way
is to use Weka API, e.g. to run "java weka.core.Instances dataset.arff" 
interface,
and catch its exception: only if it's valid it prints out the constitution of 
the
datasets (number of instances etc.). I don't know what it throws if it's not 
but it
will be different.

The format requirement and this invoicing policy "you have to pay even if your
dataset is invalid" should make them format datasets properly.

Original comment by harri.sa...@gmail.com on 4 Jun 2010 at 1:33