opencobra / cobrapy

COBRApy is a package for constraint-based modeling of metabolic networks.
http://opencobra.github.io/cobrapy/
GNU General Public License v2.0
465 stars 218 forks source link

Not all GLPK parameters can be set in cglpk.pyx #157

Closed matthiaskoenig closed 9 years ago

matthiaskoenig commented 9 years ago

Hi all, I am working on the replication of the dynamic FBA of the whole-cell model by Karr. For the simulation I am trying to use cobrapy based on a created SBML model. For the simulations I have to set some settings for the glpk solver which were set in the original model via the matlab mex interface to cglpk, namely the parameters lpsolver=1, presol=1, scale=1, msglev=0, tolbnd=10e-7.

Currently these parameters cannot be set via cobrapy. The subset of parameters translates to the following codes: presol => #define LPX_KPRESOL 327 /* lp->presol / scale => #define LPX_KSCALE 301 / lp->scale _/ msglev => #define LPX_KMSGLEV 300 / lp->msglev / tolbnd => #define LPX_KTOLBND 305 / lp->tol_bnd */ lpsolver => ?? this is doing some preprocessing in the glpk.cc related to matlab mex interface and influences how the LP problem is created; no idea how this is set in glpk or what it means??

The problem is that non of the above parameters is set in cglpk.pyx in the function cpdef set_parameter(self, parameter_name, value):

Could you please support the existing control parameters for glpk listed below via adding the lines: elif parameter_name == "presol": self.parameters.presol = float(value) with cast to float or int depending on parameter (see below) ...


glpk.h /* control parameter identifiers: */

define LPX_K_MSGLEV 300 /* lp->msg_lev */

define LPX_K_SCALE 301 /* lp->scale */

define LPX_K_DUAL 302 /* lp->dual */

define LPX_K_PRICE 303 /* lp->price */

define LPX_K_RELAX 304 /* lp->relax */

define LPX_K_TOLBND 305 /* lp->tol_bnd */

define LPX_K_TOLDJ 306 /* lp->tol_dj */

define LPX_K_TOLPIV 307 /* lp->tol_piv */

define LPX_K_ROUND 308 /* lp->round */

define LPX_K_OBJLL 309 /* lp->obj_ll */

define LPX_K_OBJUL 310 /* lp->obj_ul */

define LPX_K_ITLIM 311 /* lp->it_lim */

define LPX_K_ITCNT 312 /* lp->it_cnt */

define LPX_K_TMLIM 313 /* lp->tm_lim */

define LPX_K_OUTFRQ 314 /* lp->out_frq */

define LPX_K_OUTDLY 315 /* lp->out_dly */

define LPX_K_BRANCH 316 /* lp->branch */

define LPX_K_BTRACK 317 /* lp->btrack */

define LPX_K_TOLINT 318 /* lp->tol_int */

define LPX_K_TOLOBJ 319 /* lp->tol_obj */

define LPX_K_MPSINFO 320 /* lp->mps_info */

define LPX_K_MPSOBJ 321 /* lp->mps_obj */

define LPX_K_MPSORIG 322 /* lp->mps_orig */

define LPX_K_MPSWIDE 323 /* lp->mps_wide */

define LPX_K_MPSFREE 324 /* lp->mps_free */

define LPX_K_MPSSKIP 325 /* lp->mps_skip */

define LPX_K_LPTORIG 326 /* lp->lpt_orig */

define LPX_K_PRESOL 327 /* lp->presol */

define LPX_K_BINARIZE 328 /* lp->binarize */

define LPX_K_USECUTS 329 /* lp->use_cuts */

define LPX_K_BFTYPE 330 /* lp->bfcp->type */

define LPX_K_MIPGAP 331 /* lp->mip_gap */


// Integer Param Names int IParam[NIntP] = { LPX_K_MSGLEV, LPX_K_SCALE, LPX_K_DUAL, LPX_K_PRICE, LPX_K_ROUND, LPX_K_ITLIM, LPX_K_ITCNT, LPX_K_OUTFRQ, LPX_K_MPSINFO, LPX_K_MPSOBJ, LPX_K_MPSORIG, LPX_K_MPSWIDE, LPX_K_MPSFREE, LPX_K_MPSSKIP, LPX_K_BRANCH, LPX_K_BTRACK, LPX_K_PRESOL, LPX_K_USECUTS, LPX_K_PREPROCESS, LPX_K_BINARIZE, LPX_K_RATIO_TEST };

//Real Param Names int RParam[NRealP] = { LPX_K_RELAX, LPX_K_TOLBND, LPX_K_TOLDJ, LPX_K_TOLPIV, LPX_K_OBJLL, LPX_K_OBJUL, LPX_K_TMLIM, LPX_K_OUTDLY, LPX_K_TOLINT, LPX_K_TOLOBJ, LPX_K_MIPGAP };

Matthias

aebrahim commented 9 years ago

Thanks. Right now you can do the following:

tol_bnd can be set through "tolerance_feasibility" msg_lev can be set through "verbosity"

I don't think GLPK has any support for "scale" because I could see no reference to any such paramter in its documentation. Please correct me if I am wrong on this.

I just added support for the presolve option and a few others.

matthiaskoenig commented 9 years ago

Hi, thanks for the changes.

The scale parameter is used in the source code to scale the LP problem. So scale=1 performs equilibration scaling of the problem before optimization (which probably changes the solution returned if not unique). Probably not documented, but clearly used in the code.

Implemented in glplpx01.c

void lpx_scale_prob(LPX *lp)
{     /* scale problem data */
      switch (lpx_get_int_parm(lp, LPX_K_SCALE))
      {  case 0:
            /* no scaling */
            glp_unscale_prob(lp);
            break;
         case 1:
            /* equilibration scaling */
            glp_scale_prob(lp, GLP_SF_EQ);
            break;
         case 2:
            /* geometric mean scaling */
            glp_scale_prob(lp, GLP_SF_GM);
            break;
         case 3:
            /* geometric mean scaling, then equilibration scaling */
            glp_scale_prob(lp, GLP_SF_GM | GLP_SF_EQ);
            break;
         default:
            xassert(lp != lp);
      }
      return;
}

tol_bnd can be set through "tolerance_feasibility" -> yes, but has unwanted side effect of also setting tol_dj in your code

elif parameter_name in {"tolerance_feasibility"}:
    self.parameters.tol_bnd = float(value)
    self.parameters.tol_dj = float(value)

I am not sure which parts of the parameters are documented, but would be nice to be able to set all the available integer and double parameters I listed above. Just add all lines for the available parameters like

elif parameter_name in {"tol_bnd"}:
    self.parameters.tol_bnd = float(value)
elif parameter_name in {"tol_dj"}:
    self.parameters.tol_dj = float(value)
aebrahim commented 9 years ago

Ok the primal and dual tolerance separation is no big deal and I will do that. I don't really think I will implement scale (may consider a pull request).

Although if you are setting scale to 1 I'd be surprised if it makes a difference. On Apr 8, 2015 1:49 PM, "Matthias König" notifications@github.com wrote:

Hi, thanks for the changes.

The scale parameter is used in the source code to scale the LP problem. So scale=1 performs equilibration scaling of the problem before optimization (which probably changes the solution returned if not unique). Probably not documented, but clearly used in the code.

Implemented in glplpx01.c

void lpx_scale_prob(LPX lp) { / scale problem data _/ switch (lpx_get_int_parm(lp, LPX_KSCALE)) { case 0: / no scaling _/ glp_unscaleprob(lp); break; case 1: / equilibration scaling _/ glp_scale_prob(lp, GLP_SFEQ); break; case 2: / geometric mean scaling _/ glp_scale_prob(lp, GLP_SFGM); break; case 3: / geometric mean scaling, then equilibration scaling */ glp_scale_prob(lp, GLP_SF_GM | GLP_SF_EQ); break; default: xassert(lp != lp); } return; }

tol_bnd can be set through "tolerance_feasibility" -> yes, but has unwanted side effect of also setting tol_dj in your code

elif parameter_name in {"tolerance_feasibility"}: self.parameters.tol_bnd = float(value) self.parameters.tol_dj = float(value)

I am not sure which parts of the parameters are documented, but would be nice to be able to set all the available integer and double parameters I listed above. Just add all lines for the available parameters like

elif parameter_name in {"tol_bnd"}: self.parameters.tol_bnd = float(value) elif parameter_name in {"tol_dj"}: self.parameters.tol_dj = float(value)

— Reply to this email directly or view it on GitHub https://github.com/opencobra/cobrapy/issues/157#issuecomment-91033147.

aebrahim commented 9 years ago

I reread this. I didn't realize that scale was an integer parameter in the toolbox and not a floating point one.

I'm not sure what the best way to handle scale is. It isn't really a solve parameter so much as a modification made to the lp. I'll give it more thought.

How ill scaled is the whole cell model? For M models in the latest version of GLPK generally the default parameters pretty much always work and there isn't too much parameter sensitivity. For more ill scaled models there is more parameter sensitivity, but there are also better solvers to consider using in those cases (such as soplex). On Apr 8, 2015 3:08 PM, "Ali Ebrahim" aebrahim@ucsd.edu wrote:

Ok the primal and dual tolerance separation is no big deal and I will do that. I don't really think I will implement scale (may consider a pull request).

Although if you are setting scale to 1 I'd be surprised if it makes a difference. On Apr 8, 2015 1:49 PM, "Matthias König" notifications@github.com wrote:

Hi, thanks for the changes.

The scale parameter is used in the source code to scale the LP problem. So scale=1 performs equilibration scaling of the problem before optimization (which probably changes the solution returned if not unique). Probably not documented, but clearly used in the code.

Implemented in glplpx01.c

void lpx_scale_prob(LPX lp) { / scale problem data _/ switch (lpx_get_int_parm(lp, LPX_KSCALE)) { case 0: / no scaling _/ glp_unscaleprob(lp); break; case 1: / equilibration scaling _/ glp_scale_prob(lp, GLP_SFEQ); break; case 2: / geometric mean scaling _/ glp_scale_prob(lp, GLP_SFGM); break; case 3: / geometric mean scaling, then equilibration scaling */ glp_scale_prob(lp, GLP_SF_GM | GLP_SF_EQ); break; default: xassert(lp != lp); } return; }

tol_bnd can be set through "tolerance_feasibility" -> yes, but has unwanted side effect of also setting tol_dj in your code

elif parameter_name in {"tolerance_feasibility"}: self.parameters.tol_bnd = float(value) self.parameters.tol_dj = float(value)

I am not sure which parts of the parameters are documented, but would be nice to be able to set all the available integer and double parameters I listed above. Just add all lines for the available parameters like

elif parameter_name in {"tol_bnd"}: self.parameters.tol_bnd = float(value) elif parameter_name in {"tol_dj"}: self.parameters.tol_dj = float(value)

— Reply to this email directly or view it on GitHub https://github.com/opencobra/cobrapy/issues/157#issuecomment-91033147.

matthiaskoenig commented 9 years ago

As far as I understand your code you do not have to implement anything, just to forward the parameters to the solver (i.e. set the respective names expected by the c interface with respective cast to float or int and glpk should use them).

I have no idea how ill scaled the Karr model is. The problem I have so far is, that the flux distribution resulting as optimal solution from glpk is very different between the matlab mex interface to glpk and python interface to glpk. Both solutions have the same optimal growth, but I assume due to the non-uniqueness of the solution completely different points are taken from the solution cone of the FBA problem. I imagine that the scaling of the problem (and other parameters like presolve) will have an effect on how a single solution is selected from the possible solution space. So my plan was: 1) set all solver parameters identical in the matlab and cobrapy simulation 2) compare the resulting flux distribution => if identical than the results are reproducible between the two implementations which would be very nice (i.e. the way how one single solution is selected by glpk depends only on the solver parameters and the problem structure) => if not that means that the selection of a single solution by the solver actually depends on the order of constraints or other subtile things (these gives me some headaches and questions the reproducibility of FBA flux distributions. It breaks down to the problem how to reproducible select a single solution from the flux cone of solutions).

Currently, the returned flux distribution of the Karr model with cobrapy has some very large fluxes for some reactions (which do not occur in the original Matlab implementation) which look like a cycle. This will result in largely different updates of metabolite counts in the dynamical FBA and probably very different trajectories. I assume that the equilibration scaling results in quit homogenous fluxes for all reactions avoiding these very large cycles. So the scaling is not so much about ill-scaled models but about getting an identical solution between different implementations using the same underlying solver (here glpk). If this is not possible with setting identical solver options with the same FBA problem (i.e. identical stoichiometric matrix and flux bounds) this is a result in itself.

aebrahim commented 9 years ago

there is no parameter for scale which you just pass on. You have to call functions on the struct itself which presumably modifies the LP.

I see what you are trying to do. Are you sure you are using the same version of glpk? As far as I remember the toolbox uses an older version and cobrapy uses the latest one. The API changed for glpk at some point in this period so it won't be trivial to up/down grade.

If you want to eliminate cycles you can run pfba instead of fba which will give you an optimal flux state without cycles.

Generally I think it will be hard to rely on "solver noise" behaving the same to pick the same flux state. Is this necessary for the Karr et al. model? That would surprise me a lot. On Apr 9, 2015 12:26 AM, "Matthias König" notifications@github.com wrote:

As far as I understand your code you do not have to implement anything, just to forward the parameters to the solver (i.e. set the respective names expected by the c interface with respective cast to float or int and glpk should use them).

I have no idea how ill scaled the Karr model is. The problem I have so far is, that the flux distribution resulting as optimal solution from glpk is very different between the matlab mex interface to glpk and python interface to glpk. Both solutions have the same optimal growth, but I assume due to the non-uniqueness of the solution completely different points are taken from the solution cone of the FBA problem. I imagine that the scaling of the problem (and other parameters like presolve) will have an effect on how a single solution is selected from the possible solution space. So my plan was: 1) set all solver parameters identical in the matlab and cobrapy simulation 2) compare the resulting flux distribution => if identical than the results are reproducible between the two implementations which would be very nice (i.e. the way how one single solution is selected by glpk depends only on the solver parameters and the problem structure) => if not that means that the selection of a single solution by the solver actually depends on the order of constraints or other subtile things (these gives me some headaches and questions the reproducibility of FBA flux distributions. It breaks down to the problem how to reproducible select a single solution from the flux cone of solutions).

Currently, the returned flux distribution of the Karr model with cobrapy has some very large fluxes for some reactions (which do not occur in the original Matlab implementation) which look like a cycle. This will result in largely different updates of metabolite counts in the dynamical FBA and probably very different trajectories. I assume that the equilibration scaling results in quit homogenous fluxes for all reactions avoiding these very large cycles. So the scaling is not so much about ill-scaled models but about getting an identical solution between different implementations using the same underlying solver (here glpk). If this is not possible with setting identical solver options with the same FBA problem (i.e. identical stoichiometric matrix and flux bounds) this is a result in itself.

— Reply to this email directly or view it on GitHub https://github.com/opencobra/cobrapy/issues/157#issuecomment-91135783.

matthiaskoenig commented 9 years ago

I understand. I thought you can just pass the full integer and float set of parameters to glpk. So what you have to do is switching on the scale parameter analog to the matlab scaling, i.e. call the glp_scale_prob() with the correct option depending on the scale parameter before solving the problem:

%           scale (default: 1). Scaling option: 
%                   0 - No scaling.
%                   1 - Equilibration scaling.
%                   2 - Geometric mean scaling, then equilibration scaling.
%                   3 - Geometric then Equilibrium scaling 
%                   4 - Round to nearest power of 2 scaling

  //-- scale the problem data (if required)
  if (! glpIntParam[16] || lpsolver != 1) {
    switch ( glpIntParam[1] ) {
        case ( 0 ): glp_scale_prob( lp, GLP_SF_SKIP ); break;
        case ( 1 ): glp_scale_prob( lp, GLP_SF_GM ); break;
        case ( 2 ): glp_scale_prob( lp, GLP_SF_EQ ); break;
        case ( 3 ): glp_scale_prob( lp, GLP_SF_AUTO  ); break;
        case ( 4 ): glp_scale_prob( lp, GLP_SF_2N ); break;
        default :
            mexErrMsgTxt("glpkcc: unrecognized scaling option");
            longjmp (mark, -1);
    }
  }
  else {
    /* do nothing? or unscale?
        glp_unscale_prob( lp );
    */
  }

I also found out what the lpsolver option in matlab means:

%           lpsolver (default: 1) Select which solver to use.
%                  If the problem is a MIP problem this flag will be ignored.
%                   1 - Revised simplex method.
%                   2 - Interior point method.
%                   3 - Simplex method with exact arithmatic. 

which just defines the method of solution and is set to default in the Karr model, so need to implement this setting.

The actual FBA solution influences the future system behavior in the Karr model, so the selection of the solution could matter for the system behavior. There exist metabolite counts in the Karr model which are updated based on the selected not-unique FBA solution. The metabolite counts than influence the upper and lower bounds of the next iteration of the dynamical FBA. If some solution is selected from the flux-cone which results in the limitation of a metabolite essential for some other process in the model the selection of the FBA solution could be a limiting factor and determine the total model behavior.

aebrahim commented 9 years ago

OK I implemented scaling in c41be14 - it wasn't as bad as I thought it would be (along with the independent primal/dual tolerances.

As far as the method goes, cglpk uses simplex every time. It can use the exact simplex by passing exact=True.