Closed neil-lindquist closed 4 years ago
Great idea!!!!! Let me know when you get your clml-arff reader finished with doc, examples and tests if you don't mind. Then I will pull it into clml.
btw, regarding file reader, hdf5-cffi was added to quicklisp recently, so someone could work on it too.
I wasn't sure if this would be better in this repository or in the clml.extras repository (or even as a separate quicklisp project/repo), so I created this issue to get feedback on that. If it were part of this repository,
:arff
could become another type forread-data-from-file
ARFF is a file format that was created for use with Weka (a data mining program). It stores the column names and types in the header of the file, then the data in a csv-like section. Other useful features include allowing comments in the file and having a specification for sparse data. I'm not totally sure how prevalent arff format is, the professor for my data mining class likes it and there exist arff readers for various languages (including R, Python, Java and C++)
ARFF specs: http://weka.wikispaces.com/ARFF%20%28stable%20version%29
I've started implementing this, https://github.com/neil-lindquist/clml-arff-prototype. Below is an example of the file
csv-vs-arff.lisp
loaded with sbcl. The same data is loaded in arff and csv formats. The decision trees are created with a pre-pruning epsilon of 0.05 to ease readability.