neurospin / pylearn-epac

Embarrassingly Parallel Array Computing: EPAC is a machine learning workflow builder.
BSD 3-Clause "New" or "Revised" License
12 stars 3 forks source link

Store issue with non-flat store structure #8

Closed JinpengLI closed 11 years ago

JinpengLI commented 11 years ago

When we have a Epac tree with non-flat store structure as below:

    ####################################################################
    ## EPAC WORKFLOW
    # -------------------------------------
    #             Perms                              Perm (Splitter)
    #         /     |       \
    #        0     1(s)       2(s)                   Samples (Slicer)
    #        |
    #       CV (s)                                   CV (Splitter)
    #  /       |       \
    #0        1       2                             Folds (Slicer)
    # |        |       |
    # Pipe     Pipe     Pipe                         Sequence
    # |
    #2                                              SelectKBest (Estimator)
    # |
    # Grid
    # |                     \
    # SVM(linear,C=1)   SVM(linear,C=10)            Classifiers (Estimator)
    pipeline = Pipe(SelectKBest(k=2),
                    Grid(*[SVC(kernel="linear", C=C)
                    for C in [1, 3]]))
    self.wf = Perms(CV(pipeline, n_folds=3),
                    n_perms=10,
                    permute="y",
                    y=self.y)
    self.store = StoreFs(dirpath=self.tree_root_relative_path)
    self.wf.save_tree(store=self.store)

where (s) means the StoreMem. Perms (1, 2) will be merged as one instance of StoreMem at the same level into a dictionary, c.f. def load(self, key="") in StoreFs and CV is kept at another level.

This leads to a bug in load_state(name="results") in BaseNode.

def load_state(self, name="default"):
    return self.get_store().load(key_push(self.get_key(), name))

def get_store(self):
    """Return the first store found on the path to tree root. If no store
    has been defined create one on the tree root and return it."""
    curr = self
    while True:
        if curr.store:
            return curr.store
        if not curr.parent:
            curr.store = StoreMem()
            return curr.store
        curr = curr.parent

If we want to load results of Perm, but get_store will return CV since the tree has been merged into a "sequence" like structure. There are similar problems in epac_mapper:

    curr_node.store = StoreMem()
    func = getattr(curr_node, function)
    func(recursion=True, **cpXy)
    # print "Save results"
    curr_node.save_node(store=store_fs)

I am trying to fix it. Don't hesitate to give me any suggestions by comments @duchesnay .

JinpengLI commented 11 years ago

I temporarily solve it by rewriting get_store as below:

def get_store(self, name="default"):
    """Return the first store found on the path to tree root. If no store
    has been defined create one on the tree root and return it."""
    curr = self
    closest_store = None
    while True:
        if curr.store:
            if not closest_store:
                closest_store = curr.store
            if curr.store.load(key_push(self.get_key(), name)):
                return curr.store
        if not curr.parent:
            if closest_store:
                return closest_store
            curr.store = StoreMem()
            return curr.store
        curr = curr.parent

 def load_state(self, name="default"):
    return self.get_store(name=name).load(key_push(self.get_key(), name))

If you have a better solution, please let me know @duchesnay . Feel free to give me feedback for this problem.