Closed diegoesteves closed 8 years ago
I don't understand why we should transform all WEKA (and the other) algorithms into an enum. A much easier way to identify algorithms is using their global package name, if a URI is not available. It is semantically suitable (e.g., my J48 could be different from yours) and keeps an open world assumption.
import weka.classifiers.Classifier;
...
public MEXAlgorithm asMEXAlgorithm(Classifier classifier) {
return classifier.getClass().getCanonicalName();
}
where Classifier is the abstract class that generalizes all algorithms.
Hi @mommi84 , but then LOG4MEX would be useful just for Weka
implementations. Filling the enum
structure, the idea is to provide (along with other algorithm names, from ML literature) a decoupled solution to represent the algorithms. Your suggestion implies in a coupled scenario (weka dependence) for LOG4MEX. What about you're using other than Weka
in your ML script?
We could create more constructors, for each ML tool, though. However, again, in case they (tool / software) change the method signature or package, LOG4MEX would not work properly anymore.
However, it is a plausible questioning and It brings back a discussion some of us had before: the trade-off issue. I'm aware that the current implementation implies in a much more coupled solution. It also restricts the usage of LOG4MEX, once a specific algorithm can not be founded. However, theoretically (in terms of ontologies and vocabularies), how can we separate, conceptually, implementations of Decision Trees
in the mex
dump files? e.g.: I want to query all the executions and its related performance measures for C4.5
algorithm. Currently, we just have to search for ... ?x a mexalgo:C4.5 ...
. The open world assumption concept does not restrict classes, which is extremely positive in terms of usage (people are free to represent any kind of algorithm, even if we just create one new right now (xptoDT
), for instance), but fails in grouping concepts (in this case the implementation of a Decision Tree
algorithm called C4.5
).
We could have as scenario either:
01 - create an instance of mexalgo:Algorithm
(i.e., no subclasses here anymore, respecting the open world concept, everybody refers to represent algorithms)
this:alg a mexalgo:Algorithm.
and use mexalgo:AlgorithmClass
for representing it as mexalgo:DecisionTree
(therefore providing an insight of a decision tree
algorithm's implementation) which is subclass of mexalgo:AlgorithmClass
this:alg mexalgo:hasAlgorithmClass mexalgo:DecisionTree.
This scenario leads to a more generic and comprehensive architecture , but fails in specializing a more refined structure, i.e., there is no manner to know whether this:alg
is a C4.5
or ID3
implementation, for instance.
OR a slightly different approach,
02 - specialise mexalgo:AlgorithmClass
(in practical terms by moving the list of algorithms from NamedAlgorithm
to AlgorithmClass
) and keep mexalgo:Algorithm
to provide an open world concept, i.e., people can now create algorithms and link it to a more refined structure (mexalgo:C4.5
subclassOf mexalgo:DecisionTrees
subclassOf mexalgo:AlgorithmClass
)
These scenarios are definitely more semantic-web-oriented than before. The user must have the possibility to define an algorithm (the more specific, the better). However at the moment, there is no such option from log4mex, since mex.Configuration().addAlgorithm()
takes only objects of type EnumAlgorithms
. You should change this and accept URIs as parameter.
@mommi84 , could you please:
check/update the class
MEXEnum.EnumAlgorithms
public enum EnumAlgorithms { PART(EnumAlgorithm.PART.toString()) ...
in order to contemplate WEKA algorithms