This repository contains an implementation of the TILDE algorithm for classification. The implementation is in function of my master thesis in Artificial Intelligence. The TILDE algorithm was proposed in:
Blockeel, H., & De Raedt, L. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1�2), 285�297. http://doi.org/10.1016/S0004-3702(98)00034-4
This repository contains two versions:
The mai_version package contains the version made for my master thesis.
The refactor package contains a rewrite of the core functionality. The goal of the rewrite was to reduce the dependency on ProbLog for query and example representations. Different query engines can now be used as a back end for testing queries on examples, such as Django, FLGG and Subtle. Other query engines can be added easily by implementing a wrapper with the required interface. The refactor package contains its own README.md file introducing the package's content. Currently, it still uses some functionality from the mai_version package, for example for IO. This might be removed in a future refactor, as this could also be abstracted away from ProbLog.
Pyhon 3.6 or higher is required, since the typing module is used.
When using Django, you should have a compiled version of the Django subsumption engine.
A couple of toy example data sets to get started with can be found in the ACE documentation.
This refactor contains a rework of the ProbLog-based TILDE-code. The main focus of this rework is separation of the general FOL-decision tree learning code and the ProbLog code which was used for representing logic, examples, queries ..., and for evaluating the queries on examples.
Separating the high-level decision-tree learning code from ProbLog allows the use of other libraries for representation and evaluation of queries and examples.
The structure of this package is as follows:
query_testing_back_end: different implementations of the interfaces to test queries on examples
flgg_py4j: this uses the Java subsumption engine as described by
Fuksov�, A. (2007). Fast relational learning using bounded LGG. Journal of Machine Learning Research, 8, 549�587.
It uses ProbLog for IO and initial representation of examples and queries. These are converted to strings in a format that should be accepted by the subsumption engine. The subsumption engine should be started as a separate Java process before running this code. The communication is done using Py4J.
A short overview of the functionality of the high-level FOL decision tree package. For more up-to-date info, see the source files.
evaluation:
example:
leaf_strategy:
split_criterion:
splitter:
stop_criterion:
test_generation:
tree:
tree_builder:
tree_node:
tree_pruning:
from refactor.tilde_essentials.example import Example
from refactor.tilde_essentials.leaf_strategy import LeafBuilder
from refactor.tilde_essentials.splitter import Splitter
from refactor.tilde_essentials.stop_criterion import StopCriterion
from refactor.tilde_essentials.tree import DecisionTree
from refactor.tilde_essentials.tree_builder import TreeBuilder
from refactor.tilde_on_problog.evaluation import SimpleProgramQueryEvaluator
from refactor.tilde_on_problog.test_generation import ProbLogTestGeneratorBuilder
from problog.logic import Term, Var, Constant
# NOTE: include a non-empty example list
language = ...
examples = ... # type: List[Example]
prediction_goal = Term('bongard')(Var('A',Constant('pos')))
engine = ... # ProbLog engine
test_evaluator = SimpleProgramQueryEvaluator()
test_generator_builder = ProbLogTestGeneratorBuilder(language=language,
query_head_if_keys_format=prediction_goal)
splitter = Splitter(split_criterion_str='entropy', test_evaluator=test_evaluator,
test_generator_builder=test_generator_builder)
leaf_builder = LeafBuilder()
stop_criterion = StopCriterion()
tree_builder = TreeBuilder(splitter=splitter, leaf_builder=leaf_builder, stop_criterion=stop_criterion)
decision_tree = DecisionTree()
decision_tree.fit(examples=examples, tree_builder=tree_builder)
print(decision_tree)