Open dktcoding opened 7 years ago
I know this classes need tons of work. They kind of grew organically from the C4.5 implementation to something else when i was investigating the Bayes Network implementations.
Some comments on some of your points:
getFrequencies(int lo, int hi, int index)
I would rather focus on this later, as this will be part of a bigger architectural change that might affect several other components (C4.5 & Bayes) and i want to assess the scope of the change first.
I've been writing the tests for the current
DataSet
implementations, but there are some things that need work (specially if the idea is to use them to train NN):JavaDoc
it's really really hard to readJavaDoc
to theAttribute
interfaceDiscreteAttribute
toCategoricalAttribute
DiscreteAttribute
There are some missing features:
MetaData
(at least attribute names, so it can be removed from theAttribute
interface)DataSet
MySQLDataSet
resources are left open (I believe we talk about this a while ago)TextFileDataSet
(at least allow setting the splitting regex, check if file has headers, etc.)MatrixDataSet
and aLargeTextFileDataSet
I'm assuming that this classes were created specifically for
C4.5
, but they need to be generalized a bit.