Open elkasoapy opened 8 years ago
out of town until Monday. I think I have some pending changes in a branch that fix some of the above problems.
I assume the current version of this query is not supposed to run in its current state as it is still work in progress, am I right?
right, I'm working on this one for debugging the MDF paths.
It seems like none of the scheduled examples (simple-pipeline-scheduledef, simple-dag-schedule, simple-mdf-schedule) within SEEPng are working at the moment.
When running simple-pipeline-scheduledef & simple-dag-schedule, SEEPng launches an exception due to the call to function ProtocolCommandFactory.buildScheduleStageCommand in lines 67-68 of class DataParallelWithInputDataLocalityLoadBalancingStrategy, which calls cdr.getRankedDatasetForNode(euId) as an input parameter. The cdr variable from class ClusterDatasetRegistry is asked to provide the ranked datasets per node according to the given euId. It seems like this information is stored in a map (Map<Integer, List> rankedDatasetsPerNode), which according to the comments specifically contains datasets per node ordered by priority to live in memory. The call to this object fails in line 38 (rankedDatasetsPerNode.get(euId)) because the rankedDatasetsPerNode map has not been previously initialized. To solve this problem, I think we need to initialize this map before-hand, however, as I'm not sure how this should behave, I don't know where we should initialize this.
In the case of simple-mdf-schedule I assume the current version of this query is not supposed to run in its current state as it is still work in progress, am I right? For example, in the Base class, evaluator1 (line 23) has the same operator id as adderTwo (line 22). When creating the choose operator (line 26), the newChooseOperator method invoked from the QueryBuilder (lines 143 to 147) returns a null value, which I think is something that needs to be implemented.