kflorios / maxscore-estimator-mip

Exact computation of Maximum Score estimator with Mixed Integer Programming via MATLAB code
MIT License
1 stars 0 forks source link

Question about the performance of MIP #1

Open rohitpatra opened 7 years ago

rohitpatra commented 7 years ago

Hi Kostas, I have small question about the maxscore estimator through MIP. In the 2008 paper it is said that

It is also crucial to note that in our experiments we observed that the exact maximum can often be obtained very quickly, within a few minutes, but what takes longer is for the algorithm to prove that the maximum found is indeed exact; this suggests that satisfactory results are obtainable even with severe computation constraints.

I am trying to use your code to implement a bootstrap procedure for the maximum score estimator. And I am interested in sample sizes higher than 1000 and I was wondering that If there would be way for to stop the code after a reasonable amount of time (say 1 hour) assuming that we have achieved the maximum.

Any comments on the issue would be very helpful. Best, Rohit Patra

kflorios commented 7 years ago

Hi Rohit Patra, I appreciate your paper on the bootstrap of the maximum score estimator, it seems really interesting. As to what you have asked, I suggest you dig in a little bit to the MIP terminology and methodology in general. What you ask is an early stop for the solver, which is very useful for the user. The user can require that the relative gap to the optimum is not 0.00% but let's say 5% or 2%. (so equal to 0.05 and 0.02). See the option"RelativeGapTolerance" in the intlinprog() definition: https://www.mathworks.com/help/optim/ug/intlinprog.html. Sometimes, because this can also be slow in large datasets, it is more straightforward to set the CPU seconds allowed for MIP to run, see the option "MaxTime" in intlinprog() link. I would suggest you set MaxTime to 3600 seconds and record also the relativegap and/or absolutegap in the output of the intlinprog() call. It is nice to have information on whether the algorithm terminated successfully or it hit the time budget allowed to run. I encourage you to find a CPLEX license in your university and study these links

  1. http://www.ibm.com/support/knowledgecenter/SSSA5P_12.6.2/ilog.odms.studio.help/Optimization_Studio/topics/COS_home.html
  2. http://www.ibm.com/support/knowledgecenter/SSSA5P_12.6.2/ilog.odms.studio.help/pdf/gscplexmatlab.pdf

where 1 is the manual of CPLEX solver for all languages used to call it (C, python, MATLAB, etc) and 2 is the manual for the MATLAB interface.

Also, check out www.gams.com because this is where it all started with MIP, historically (GAMS modeling system).

Have in mind that you really need CPLEX and a tractable DGP in order to obtain accurate estimates (with gap=0.00%) which for large samples will approximate your true coefficients satisfactorily.

If you not design the experiment carefully, you can definitely get suboptimal results (e.g. using a slow MIP solver, or/and a large scale dataset (large N, large K) with a big error term (such as Cauchy error))

I am willing to help you in preparing your simulations, if you have more questions. Best regards, Kostas Florios

rohitpatra commented 7 years ago

Hi Kosatas,

Thank you for your detailed reply. The early stoping with the time constraint with a check for optimality sounds like a good idea. I think I will try to implement what you did in R, the ‘Rcplex’ package in R seems to be very useful. If you do not mind I will ask you further questions about your codes when I am going through them.

Best, Rohit

kflorios commented 7 years ago

Hi Rohit Patra,

Here is something that could be useful. A similar problem to MAX SCORE is CLAD estimator. It is like moving from probit/logit and going to tobit, in the parametric case.

You might want to have a look at this package when writing your R link to CPLEX using Rcplex for the MAX SCORE estimator. I have developed an R package called CLADmip, which for the moment I provide in my webpage https://sites.google.com/site/kflorios/clad

R
(R package CLADmip source) NEW- June 2015 (R package CLADmip binary) NEW- June 2015 (R package CLADmip manual) NEW- June 2015 {the package uses Rmosek and MOSEK solver, which are both free for academic users}

Step-by-Step INSTALL of CLADmip July 2015 Note: The package has been tested with 32-bit R and 32-bit MOSEK and Rmosek.

Since MOSEK has an easy process to obtain academic license, which can be completed in 10 minutes, that package in R CLADmip uses the MOSEK solver (and assumes the Rmosek package as a dependency).

If you can manage to use Rcplex then it will be better. Note that in the CLADmip R package, the actual MIP model is generated with a Fortran .exe which takes the input matrices X.txt, y.txt and creates the .LP format of the MIP model. Then MOSEK takes charge (this could be CPLEX, SCIP, etc) and solves the optimization.

The similar exe for MAX SCORE can be found at: https://sites.google.com/site/kflorios/mws

That package (CLADmip) solves the CLAD estimator not the MAX SCORE estimator, but the computer programming logic is similar. I could add in my website a MAXSCOREmip R package if you think it is useful, but right now in R I have managed to link only MOSEK and SCIP, not CPLEX. In Matlab I have managed to link CPLEX and GUROBI, which are the two best MIP solvers. In GAMS everything is available by default (every MIP solver).

Yes, sometimes R is more convenient! Best regards, Kostas

kflorios commented 7 years ago

Dear Rohit,

in case you want to link a call from R to GAMS in order to compute the Max Score estimator, you can use the following two files in co-operation:

  1. https://support.gams.com/gdxrrw:interfacing_gams_and_r

  2. https://www.gams.com/latest/gamslib_ml/libhtml/gamslib_mws.html

The first is the gdxrrw package which lets you call GAMS from within R. It is an R package actually. The second is the MIP model for the Max Score estimator for the Horowitz (1993) dataset.

If you want a more user-friendly version of the GAMS MIP model in point (2), send me and email, and I will send you a version of the MIP model (2), which depends on the files X.txt, y.txt, w.txt in a generic way, so that you can easily load your own data.

Best regards, Kostas

rohitpatra commented 7 years ago

Hi Kostas,

Thanks for the links. I installed Rcplex and have been trying to play with it. But now with the second link, I am thinking maybe I should try GAMS. If you have a more user-friendly version of the second file that can take as input a matrix of three files as you mentioned that would be very helpful. Thank you very much.

Best, Rohit

On Mon, Jan 2, 2017 at 12:01 AM, Kostas Florios notifications@github.com wrote:

Dear Rohit,

in case you want to link a call from R to GAMS in order to compute the Max Score estimator, you can use the following two files in co-operation:

1.

https://support.gams.com/gdxrrw:interfacing_gams_and_r 2.

https://www.gams.com/latest/gamslib_ml/libhtml/gamslib_mws.html

The first is the gdxrrw package which lets you call GAMS from within R. It is an R package actually. The second is the MIP model for the Max Score estimator for the Horowitz (1993) dataset.

If you want a more user-friendly version of the GAMS MIP model in point (2), send me and email, and I will send you a version of the MIP model (2), which depends on the files X.txt, y.txt, w.txt in a generic way, so that you can easily load your own data.

Best regards, Kostas

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kflorios/maxscore-estimator-mip/issues/1#issuecomment-269913898, or mute the thread https://github.com/notifications/unsubscribe-auth/AHmLgQ4BYRPFMPtlNR-ShDE1jfufbm6cks5rN_EZgaJpZM4LE0iq .

-- Rohit Patra http://stat.ufl.edu/~rohitpatra

kflorios commented 7 years ago

Hi Rohit,

I just sent you the GAMS user-friendly file privately in your email.

Using CPLEX through GAMS is the most flexible way to do MIP modeling, so it will compensate your effort.

Really the file is so simple that you copy+paste your data in X.txt, y.txt, w.txt, and read all results in a file results.txt

The GAMS file can be modified to print a one-liner auxiliary file betas.out so that when callled iteratively with R, the R:

  1. creates X.txt, y.txt, w.txt
  2. calls GAMS and runs the gms model
  3. the gms model logs the betas in betas.out (one-liner)
  4. R retrieves the one-liner betas.out and reads the estimates easily

Repeat 1-4, 100 times and you have an easy monte carlo.

The step 3 in GAMS needs for you to write a "put" statement which is the command with which GAMS writes text files.

Best, Kostas