neurospin / pylearn-parsimony_history

Sparse and Structured Machine Learning in Python
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Algorithm information / iteration number #20

Open duchesnay opened 10 years ago

duchesnay commented 10 years ago

Dear Edouard and Mathieu,

I have made a proposed solution to the problem of both passing and returning information about the algorithm through the same data structure.

My proposed solution is to pass an instance of parsimony.algorithms.utils.AlgorithmInfo, which essentially is a dictionary, but which only allows keys from a predefined set of keys. I also created an Enum class (in parsimony.utils) and an Info instance of Enum (in parsimony.algorithms.utils) with the info variables that we need (more can be added when needed). Thus, we can pass info=AlgorithmInfo([Info.t, Info.f]) and after the run obtain the information as info[Info.t].

Have a look at my last push to my private Parsimony repository on Github to see the actual code.

A thought: Since we will access the Info enum from the estimators as well, perhaps it should be in the global parsimony.utils or parsimony.utils.consts instead?

I have added this for GradientDescent and ISTA. If you like it I will add it for all other algorithms as well. Let me know what you think!

duchesnay commented 10 years ago

max_iter and conesta

From my understanding "max_iter" in CONESTA should now control the overall number of FISTA iteration. In the code on your repository "max_iter" is simply used to limit the number of iteration of each nested FISTA loop. I suggest to use a iteration counter "num_iter" than count the overall number of FISTA iterations. Then before next run of FSTA set the max_iter FISTA param:

Here self == CONESTA self.FISTA.set_params(max_iter=self.max_iter - self.num_iter)

This num_iter could be the one in Info.num_iter. An we could have another counter for CONESTA loop

Info Why not using a predefined class, with all attributes set to None. The one set to True should be filled by the algo.

Something like: class Info: init() self.t = None self.f = None ...

In the algo if self.f: self.f = list() ... self.f[-1] = fval

It would be simpler, and it naturally remove the need of Enum ?

tomlof commented 10 years ago

The main problem with the solution you propose is: What happens if the output of some property of the algorithm is actually None?

I actually tried that first, with a new datatype Undefined, but I did not find it as "nice" as the current one. And that problem is not really solved by that approach either...

What we really, formally, want to do is to input a set and output a dict, with the keys of the dict a subset of the keys of the set. Hence the LimitedDict.

I'll fix the max_iter once the info is done, since we must get the info from FISTA properly before we can use it.

duchesnay commented 10 years ago

+1 Lets go for your proposition

tomlof commented 10 years ago

An alternative would be to have two parameters for the info. A set of input parameters, and an empty dictionary to store the outputs in.

Though, I'm not sure I like the idea of two parameters for one thing.

tomlof commented 10 years ago

I forgot: Your proposition for the number of iterations sounds perfect! I'll add that.

tomlof commented 10 years ago

I have pushed the update to the main repo. I still need to make the estimators aware of this change, but close if you like the solution ;-)

duchesnay commented 10 years ago

The information is a bit complicated and sometime confusion. For example self.info[Info.num_iter] is necessary for conesta so it should not be in info. Moreover, its is difficult to handle for the user, that has to import Info class and set this LimitedDict. This also make some code to maintain.

As we discussed, the idea was to make info a simple dictionary attribute of algorithm. User provide desire information as a list of string [info.time, info.function_value]. The info could be a global module like const. I know you don't like pure string (like "time", "function_value"). This list of desire in information can be stored in the attribute. self.info_names

Then at the beginning of each run we reset this dictionary:

So as summary: The use provide a list of desired information [info.time, ....] At the end he can just do estimator.algorithm.info[info.time] to access the information.

tomlof commented 10 years ago

I have updated this.

The syntax is now a lot simpler than it was:

import parsimony.algorithms.proximal
from parsimony.utils.consts import Info

alg = proximal.Algorithm(info=[Info.time, Info.fvalue])
alg.run(function, start_vector)
print "function values:", alg.info_get(Info.fvalue)

or with an estimator:

import parsimony.estimators
from parsimony.utils.consts import Info

est = estimators.Estimator(algorithm_params={"info": [Info.time, Info.fvalue]})
est.fit(X, y)
print "function values:", est.get_info(Info.fvalue)