Open DavidWilliams81 opened 10 years ago
Clear terminology, classifier and learner.
Extending the classifier to use the LazyTable
seems a bit weird for a decision tree. Since the decision algorithm will be a linear selection on the attributes, the only thing necessary to apply the classifier would be to request all the objects matching the criterion that is constructed from the whole tree. This would effectively run the classifier in the database as an SQL query.
Alternatively, we could assume that each step in the classifier is very complex and has to be done in Orange. This would match the idea of a classifier network. Then we could iterate over all the objects (rows), and request (difficult-to-calculate) attributes as we need them. However, requesting attributes per individual object is costly so we want to group objects together. However, creating such a grouping is only easy when the classification is easy, which violates our assumption. Perhaps we need to think this through some more.
Similar problems arise for the learner. But perhaps we can just see how far we can go. If we can somehow guarantee that subsets are representative of the whole table, then we can probably automatically 'group' objects to get efficient calculations and data transfers. Furthermore, for the InfiniTable
it wouldn't really matter I think, so we can already develop the lazy learner and lazy classifier using the InfiniTable
(or perhaps LazyFile
).
The incremental decision tree seems to be potentially very powerful.
We need a new decision tree widget which takes advantage of the LazyTable and data pulling. Orange 3 uses scikit-learn for most of it's classification algorithms, but this is not appropriate for us as scikit-learn does not operate on such 'infinite' data sources.
We will implement our own decision tree learner in Python and have it operating directly on LazyTable and/or related classes. This can be a multistep process:
Any further thoughts?