shogun-toolbox / shogun

Shōgun
http://shogun-toolbox.org
BSD 3-Clause "New" or "Revised" License
3.02k stars 1.04k forks source link

make model selection on GP module more flexible #2817

Open yorkerlin opened 9 years ago

yorkerlin commented 9 years ago

At default, inference methods in GP module will compute gradients wrt model parameters, which is not always necessary.

yorkerlin commented 9 years ago

@karlnapf will work on this issue once notebooks are done

karlnapf commented 9 years ago
yorkerlin commented 9 years ago

@karlnapf Currently, pthread is used to parallelly compute the gradients.

yorkerlin commented 9 years ago

about moving all GP parameters to log-domain:

My opinion is

My idea is (for example in GaussianKernel) to add a log flag

CGaussianKernel::CGaussianKernel(int32_t size, float64_t w) 

becomes

CGaussianKernel::CGaussianKernel(int32_t size, float64_t w, bool is_log_domain=false) 

if flag is_log_doamin is true, w is in log domain and its gradient will be first computed in standard domain (the existing code) and then transformed (post-processing code to be implemented) in log domain.

yorkerlin commented 9 years ago

@karlnapf Let me know if you are agree with my idea so that I will implement this idea.

yorkerlin commented 9 years ago

@karlnapf the existing model selection (CGradientModelSelection) does not do gradient transformation, which means there is a bug if a parameter and its gradient are in different domains.

yorkerlin commented 9 years ago

@karlnapf Any feedback on the model selection

karlnapf commented 9 years ago

I usually prefer general solutions that do things automatically to those that require extra arugments and thinking by the user. Users should not be able to decide whether a parameter is in log-domain or not, but we should do the appropriate thing automatically (in a transparent way)

Not sure what to do though. How do other toolboxes do this? Any ideas?

yorkerlin commented 9 years ago

GPML does keep parameters in log domain if they are positive and parameters in standard domain if they can be negative or zero. UPDATED: (unconstrained representation)

if we can break the existing codes, I can move all classes related to GP from standard domain to log domain. (however, users' codes have to be modified)

if not, we have to deal with the following issues

    SG_ADD(&m_width, "width", "Kernel width", MS_AVAILABLE, GRADIENT_AVAILABLE);

we implicitly means m_width and its gradient are in the same domain. we may add a new field in TParameter, which is about the domain (log, standard or else) of a parameter. With the new field in TParameter, we can do proper transformation.

yorkerlin commented 9 years ago

If you think adding a new field in TParameter is possible, I may need some help from authors' of the class. Adding a new field in TParameter is not trivial since copy and serialization have to be modified.

CSGObject will be modified too.

yorkerlin commented 9 years ago

for example if a parameter and its gradient are in different domain, SG_ADD will become

 SG_ADD(&m_width, "width", "Kernel width", MS_AVAILABLE, GRADIENT_AVAILABLE, "domain of parameter", "domain of gradient" );
yorkerlin commented 9 years ago

@karlnapf at GradientModelSelection.cpp

  219         nlopt_opt opt=nlopt_create(NLOPT_LD_MMA, total_variables);
  220 
  221       
  222         SGVector<double> lower_bound(total_variables);
  223         lower_bound.set_const(1e-6);
  224 
  225        
  226         SGVector<double> upper_bound(total_variables);
  227         upper_bound.set_const(HUGE_VAL);

if we add the domain field in TParameter, we do need a unified representation to set lower_bound and upper_bound used in nlopt

Because now not all parameters (for example, inducing points and FULL_MATRIX weights of the ARD kernel) are positive, we do need unconstrainted or unified (eg, positive) representation to set lower_bound and upper_bound used in nlopt.

More generally, we need three new fields in TParameter, the domain of a parameter, a lower bound of the parameter, a upper bound of the parameter. (assuming parameters are floating point type). Doing so, we can set parameter-specific lower_bound and upper_bound used in nlopt.

I think the TParameteridea is a way to solve this problem. Please let me know your idea. (is this idea good for you? one field approach or three fields approach? )

One field approach (unconstrainted representation) (for now this approach may be enough)

Three fields approach (parameter-specific representation) (more general, if a parameter has additional constraint)

If you agree, I will try to add these field(s). If I fail to add these field(s), I may need some help from you or other author of TParameter and SGObject. (the tricky part is copy and serialization functions)

UPDATED

yorkerlin commented 9 years ago

@karlnapf Any feedback on the new approach?

karlnapf commented 9 years ago

I guess we cannot avoid the fact that Shogun developers have to specify the range of parameters when registering them. Therefore, I would go for optional lower and upper bound parameters. We can do those for scalars, vectors, and matrices (the latter two element-wise). For other constraints, such as positive definiteness, I guess we should stay with the current system. This also would be in line with e.h. stan's parameter representation. How would you do domains? With an enum DOMAIN_LOG, DOMAIN_STD, DOMAIN_TANH?

On the other hand, it might be an overkill as Shogun parameters are mostly unbounded or positive, so we just really need those two. Also modifying all the TParameter code is annoying (though not impossible, I could help there)

No matter what we do, we should be explicit and consistent throughout the toolbox. Maybe @lambday also has ideas? @lisitsyn ?

BTW, what about just breaking all GP code and moving the GP parameters all to log-domain. The getters/setters can still be in normal domain though?

yorkerlin commented 9 years ago

@karlnapf Let's make the parameters in log-domain first. For kernels, I only pay attention to GaussianKernel and GaussianARDKernel.

karlnapf commented 9 years ago

okok, I think that should be a refactoring exercise

yorkerlin commented 9 years ago

@karlnapf The following extensions may be useful in model selection.

About the stochastic searching for model parameters

karlnapf commented 9 years ago

What are those comments based on ? Just the paper? 40s vs 30m does not really seem too reasonable.

I guess this is MAP inference right?

yorkerlin commented 9 years ago

No. The speed of the method is between MCMC and MAP. It seems like combine MCMC and MAP.

karlnapf commented 9 years ago

I see, MCMC is slow for sure

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.