Sam-XiaoyueLi commented 1 month ago

In train_vqe in main.py, the optimizer options are given by argument optimizer_options. However, the description in the help documentation is unclear (without example code, a general user wouldn't know what to put there) and the default value for nepochs is unrealistic (100000) and tol does not help terminate the code.

For example, when optimizer='sgd', tol=1e-2 and we run the following code

niter = 3
# define the qibo loss function
objective_boost = partial(vqe_loss)
# logging history
params_history, loss_history, grads_history, fluctuations = [], [], [], []
# set optimizer
optimizer = 'sgd'
tol = 1e-2

# train vqe
(
    partial_results,
    partial_params_history,
    partial_loss_history,
    partial_grads_history,
    partial_fluctuations,
    vqe,
) = train_vqe(
    deepcopy(ansatz_circ),
    ham_boost,  # Fixed hamiltonian
    optimizer,
    initial_params,
    tol=tol,
    niterations=1,
    nmessage=1,
    loss=objective_boost,
)
params_history = np.array(partial_params_history)
loss_history = np.array(partial_loss_history)
grads_history = np.array(partial_grads_history)

which does not specify optimizer_options, the code runs almost indefinitely, like so:

In the scenario where optimizer='cma' (backend='tensorflow'), the loss function fluctuates largely (changes sign)

In summary, the default value for nepochs in the optimizers.optimize function in ansazte.py may need to be more realistic for the general user. It may also be helpful if the help documentation has more detailed descriptions of the optimizer_options. Moreover, we may need to see if 'cma' is running correctly?

MatteoRobbiati commented 1 month ago

Thanks for pointing out this! This is definitely just a documentation problem. We are not re-writing the optimizers, but using external providers. CMA-ES and TensorFlow optimizers, for example, are provided by Qibo. I wouldn't consider as an option to change the defaults of Qibo, but we can for sure set some reasonable parameters for our setup.

For example, I used to run SGD with the following config:

OPTIMIZER="sgd"
BACKEND="tensorflow"
OPTIMIZER_OPTIONS={
    "optimizer": "Adam",
    "learning_rate": "0.01",
    "nmessage": 1,
    "epochs": 1000,
}

Note, also, that it is better to use tensorflow backend only when using tensorflow SGD. For all the other optimizations, we should use numpy or qibojit.

In summary:

We should write these instructions into the README.md files and add docstring so that the users know some reasonable parameters values.

marekgluza commented 47 minutes ago

I'd say it's quite confusing that train_vqe on our repo doesn't properly restrict maxiter

For example I tried to follow the documentation from optimizers.py in qibo

import cma
cma.CMAOptions()

which points to maxiter

{'AdaptSigma': 'True  # or False or any CMAAdaptSigmaBase class e.g. CMAAdaptSigmaTPA, CMAAdaptSigmaCSA',
 'CMA_active': 'True  # negative update, conducted after the original update',
 'CMA_active_injected': '0  #v weight multiplier for negative weights of injected solutions',
 'CMA_cmean': '1  # learning rate for the mean value',
 'CMA_const_trace': 'False  # normalize trace, 1, True, "arithm", "geom", "aeig", "geig" are valid',
 'CMA_diagonal': '0*100*N/popsize**0.5  # nb of iterations with diagonal covariance matrix, True for always',
 'CMA_diagonal_decoding': '0  # multiplier for additional diagonal update',
 'CMA_eigenmethod': 'np.linalg.eigh  # or cma.utilities.math.eig or pygsl.eigen.eigenvectors',
 'CMA_elitist': 'False  #v or "initial" or True, elitism likely impairs global search performance',
 'CMA_injections_threshold_keep_len': '1  #v keep length if Mahalanobis length is below the given relative threshold',
 'CMA_mirrors': 'popsize < 6  # values <0.5 are interpreted as fraction, values >1 as numbers (rounded), for `True` about 0.16 is used',
 'CMA_mirrormethod': '2  # 0=unconditional, 1=selective, 2=selective with delay',
 'CMA_mu': 'None  # parents selection parameter, default is popsize // 2',
 'CMA_on': '1  # multiplier for all covariance matrix updates',
 'CMA_sampler': 'None  # a class or instance that implements the interface of `cma.interfaces.StatisticalModelSamplerWithZeroMeanBaseClass`',
 'CMA_sampler_options': '{}  # options passed to `CMA_sampler` class init as keyword arguments',
 'CMA_rankmu': '1.0  # multiplier for rank-mu update learning rate of covariance matrix',
 'CMA_rankone': '1.0  # multiplier for rank-one update learning rate of covariance matrix',
 'CMA_recombination_weights': 'None  # a list, see class RecombinationWeights, overwrites CMA_mu and popsize options',
 'CMA_dampsvec_fac': 'np.Inf  # tentative and subject to changes, 0.5 would be a "default" damping for sigma vector update',
 'CMA_dampsvec_fade': '0.1  # tentative fading out parameter for sigma vector update',
 'CMA_teststds': 'None  # factors for non-isotropic initial distr. of C, mainly for test purpose, see CMA_stds for production',
 'CMA_stds': 'None  # multipliers for sigma0 in each coordinate (not represented in C), or use `cma.ScaleCoordinates` instead',
 'CSA_dampfac': '1  #v positive multiplier for step-size damping, 0.3 is close to optimal on the sphere',
 'CSA_damp_mueff_exponent': '0.5  # zero would mean no dependency of damping on mueff, useful with CSA_disregard_length option',
 'CSA_disregard_length': 'False  #v True is untested, also changes respective parameters',
 'CSA_clip_length_value': 'None  #v poorly tested, [0, 0] means const length N**0.5, [-1, 1] allows a variation of +- N/(N+2), etc.',
 'CSA_squared': 'False  #v use squared length for sigma-adaptation ',
 'BoundaryHandler': 'BoundTransform  # or BoundPenalty, unused when ``bounds in (None, [None, None])``',
 'bounds': '[None, None]  # lower (=bounds[0]) and upper domain boundaries, each a scalar or a list/vector',
 'conditioncov_alleviate': '[1e8, 1e12]  # when to alleviate the condition in the coordinates and in main axes',
 'eval_final_mean': 'True  # evaluate the final mean, which is a favorite return candidate',
 'fixed_variables': 'None  # dictionary with index-value pairs like {0:1.1, 2:0.1} that are not optimized',
 'ftarget': '-inf  #v target function value, minimization',
 'integer_variables': '[]  # index list, invokes basic integer handling: prevent std dev to become too small in the given variables',
 'is_feasible': 'is_feasible  #v a function that computes feasibility, by default lambda x, f: f not in (None, np.NaN)',
 'maxfevals': 'inf  #v maximum number of function evaluations',
 'maxiter': '100 + 150 * (N+3)**2 // popsize**0.5  #v maximum number of iterations',
 'mean_shift_line_samples': 'False #v sample two new solutions colinear to previous mean shift',
 'mindx': '0  #v minimal std in any arbitrary direction, cave interference with tol*',
 'minstd': '0  #v minimal std (scalar or vector) in any coordinate direction, cave interference with tol*',
 'maxstd': 'None  #v maximal std (scalar or vector) in any coordinate direction',
 'maxstd_boundrange': '1/3  # maximal std relative to bound_range per coordinate, overruled by maxstd',
 'pc_line_samples': 'False #v one line sample along the evolution path pc',
 'popsize': '4 + 3 * np.log(N)  # population size, AKA lambda, int(popsize) is the number of new solution per iteration',
 'popsize_factor': '1  # multiplier for popsize, convenience option to increase default popsize',
 'randn': 'np.random.randn  #v randn(lam, N) must return an np.array of shape (lam, N), see also cma.utilities.math.randhss',
 'scaling_of_variables': 'None  # deprecated, rather use fitness_transformations.ScaleCoordinates instead (or CMA_stds). Scale for each variable in that effective_sigma0 = sigma0*scaling. Internally the variables are divided by scaling_of_variables and sigma is unchanged, default is `np.ones(N)`',
 'seed': 'time  # random number seed for `numpy.random`; `None` and `0` equate to `time`, `np.nan` means "do nothing", see also option "randn"',
 'signals_filename': 'cma_signals.in  # read versatile options from this file (use `None` or `""` for no file) which contains a single options dict, e.g. ``{"timeout": 0}`` to stop, string-values are evaluated, e.g. "np.inf" is valid',
 'termination_callback': '[]  #v a function or list of functions returning True for termination, called in `stop` with `self` as argument, could be abused for side effects',
 'timeout': 'inf  #v stop if timeout seconds are exceeded, the string "2.5 * 60**2" evaluates to 2 hours and 30 minutes',
 'tolconditioncov': '1e14  #v stop if the condition of the covariance matrix is above `tolconditioncov`',
 'tolfacupx': '1e3  #v termination when step-size increases by tolfacupx (diverges). That is, the initial step-size was chosen far too small and better solutions were found far away from the initial solution x0',
 'tolupsigma': '1e20  #v sigma/sigma0 > tolupsigma * max(eivenvals(C)**0.5) indicates "creeping behavior" with usually minor improvements',
 'tolflatfitness': '1  #v iterations tolerated with flat fitness before termination',
 'tolfun': '1e-11  #v termination criterion: tolerance in function value, quite useful',
 'tolfunhist': '1e-12  #v termination criterion: tolerance in function value history',
 'tolfunrel': '0  #v termination criterion: relative tolerance in function value: Delta f current < tolfunrel * (median0 - median_min)',
 'tolstagnation': 'int(100 + 100 * N**1.5 / popsize)  #v termination if no improvement over tolstagnation iterations',
 'tolx': '1e-11  #v termination criterion: tolerance in x-changes',
 'transformation': 'None  # depreciated, use cma.fitness_transformations.FitnessTransformation instead.\n            [t0, t1] are two mappings, t0 transforms solutions from CMA-representation to f-representation (tf_pheno),\n            t1 is the (optional) back transformation, see class GenoPheno',
 'typical_x': 'None  # used with scaling_of_variables',
 'updatecovwait': 'None  #v number of iterations without distribution update, name is subject to future changes',
 'verbose': '3  #v verbosity e.g. of initial/final message, -1 is very quiet, -9 maximally quiet, may not be fully implemented',
 'verb_append': '0  # initial evaluation counter, if append, do not overwrite output files',
 'verb_disp': '100  #v verbosity: display console output every verb_disp iteration',
 'verb_disp_overwrite': 'inf  #v start overwriting after given iteration',
 'verb_filenameprefix': 'outcmaes/  # output path (folder) and filenames prefix',
 'verb_log': '1  #v verbosity: write data to files every verb_log iteration, writing can be time critical on fast to evaluate functions',
 'verb_log_expensive': 'N * (N <= 50)  # allow to execute eigendecomposition for logging every verb_log_expensive iteration, 0 or False for never',
 'verb_plot': '0  #v in fmin2(): plot() is called every verb_plot iteration',
 'verb_time': 'True  #v output timings on console',
 'vv': '{}  #? versatile set or dictionary for hacking purposes, value found in self.opts["vv"]'}

but this

param = params_history[-1]
(
    partial_results,
    partial_params_history,
    partial_loss_history,
    partial_grads_history,
    partial_fluctuations,
    vqe,
) = train_vqe(
    deepcopy(ansatz_circ),
    ham,  # Fixed hamiltonian
    optimizer,
    param,
    tol=tol,
    niterations=maxiter, # Show log info
    nmessage=nmessage,
    loss=objective_boost,
    training_options={'maxiter': maxiter}
)
params_history.extend(np.array(partial_params_history))
loss_history.extend(np.array(partial_loss_history))
grads_history.extend(np.array(partial_grads_history))
fluctuations.extend(np.array(partial_fluctuations))

is not terminating

...
INFO:root:Optimization iteration 30500/3500
INFO:root:Loss -10.61

At minimum I'd say there is a logging bug. It's not an issue for the paper submission because you know how to run it but for outside users this is difficult.

qiboteam / boostvqe

Optimizer option (readability) #83

101 might be fixing it?