mexplatform / mex-vocabulary

MEX Vocabulary: a lightweight machine learning interchange format
https://github.com/METArchive/mex-vocabulary
GNU General Public License v3.0
20 stars 11 forks source link

mapping Weka algorithms to MEX-ALGO #28

Closed diegoesteves closed 8 years ago

diegoesteves commented 8 years ago

@mommi84 , could you please:

check/update the class MEXEnum.EnumAlgorithms public enum EnumAlgorithms { PART(EnumAlgorithm.PART.toString()) ... in order to contemplate WEKA algorithms

mommi84 commented 8 years ago

I don't understand why we should transform all WEKA (and the other) algorithms into an enum. A much easier way to identify algorithms is using their global package name, if a URI is not available. It is semantically suitable (e.g., my J48 could be different from yours) and keeps an open world assumption.

import weka.classifiers.Classifier;
...
public MEXAlgorithm asMEXAlgorithm(Classifier classifier) {
    return classifier.getClass().getCanonicalName();
}

where Classifier is the abstract class that generalizes all algorithms.

diegoesteves commented 8 years ago

Hi @mommi84 , but then LOG4MEX would be useful just for Weka implementations. Filling the enum structure, the idea is to provide (along with other algorithm names, from ML literature) a decoupled solution to represent the algorithms. Your suggestion implies in a coupled scenario (weka dependence) for LOG4MEX. What about you're using other than Weka in your ML script? We could create more constructors, for each ML tool, though. However, again, in case they (tool / software) change the method signature or package, LOG4MEX would not work properly anymore.

diegoesteves commented 8 years ago

However, it is a plausible questioning and It brings back a discussion some of us had before: the trade-off issue. I'm aware that the current implementation implies in a much more coupled solution. It also restricts the usage of LOG4MEX, once a specific algorithm can not be founded. However, theoretically (in terms of ontologies and vocabularies), how can we separate, conceptually, implementations of Decision Trees in the mex dump files? e.g.: I want to query all the executions and its related performance measures for C4.5 algorithm. Currently, we just have to search for ... ?x a mexalgo:C4.5 ... . The open world assumption concept does not restrict classes, which is extremely positive in terms of usage (people are free to represent any kind of algorithm, even if we just create one new right now (xptoDT), for instance), but fails in grouping concepts (in this case the implementation of a Decision Tree algorithm called C4.5).

We could have as scenario either:

01 - create an instance of mexalgo:Algorithm (i.e., no subclasses here anymore, respecting the open world concept, everybody refers to represent algorithms)

this:alg a mexalgo:Algorithm.

and use mexalgo:AlgorithmClass for representing it as mexalgo:DecisionTree (therefore providing an insight of a decision tree algorithm's implementation) which is subclass of mexalgo:AlgorithmClass

this:alg mexalgo:hasAlgorithmClass mexalgo:DecisionTree.

This scenario leads to a more generic and comprehensive architecture , but fails in specializing a more refined structure, i.e., there is no manner to know whether this:alg is a C4.5 or ID3 implementation, for instance.

OR a slightly different approach,

02 - specialise mexalgo:AlgorithmClass (in practical terms by moving the list of algorithms from NamedAlgorithm to AlgorithmClass) and keep mexalgo:Algorithm to provide an open world concept, i.e., people can now create algorithms and link it to a more refined structure (mexalgo:C4.5 subclassOf mexalgo:DecisionTrees subclassOf mexalgo:AlgorithmClass)

mommi84 commented 8 years ago

These scenarios are definitely more semantic-web-oriented than before. The user must have the possibility to define an algorithm (the more specific, the better). However at the moment, there is no such option from log4mex, since mex.Configuration().addAlgorithm() takes only objects of type EnumAlgorithms. You should change this and accept URIs as parameter.