Multiple regularization

davlars commented 6 years ago

Hi,

working along with our study but have a small question regarding the possibility of a) using regions of multiple discretization but with the same regularization, or b) using one discretization but with regions of different regularization.

The question relates to the shepp_logan_two_resolutions.py example. Below is an example running the TV-implementation (ie using a primal dual splitting) but setting the coarse and fine discretization the same (ie ratio 1:1). I assumed this would render somewhat similar results to running a full-resolution non-multigrid reconstruction, but that does not seem to be the case:

multigrid_tv

It seems like the two regions (despite having identical discretization) have different regularization. Is this expected? If we want to have identical regularization, what parts of the p-d impementation would we alter?

kohr-h commented 6 years ago

Do you use the same regularization parameter for both regions?

davlars commented 6 years ago

I at least think I am. Could well be that I'm interpreting the primal dual solver incorrectly (i.e. handling the smooth and non_smooth parts). Pasting the appropriate part of the code below (setting up the solver etc, a lot copied from the example):

        #Gradient
        insert_grad = odl.Gradient(insert_discr, pad_mode='order1')

        # Differentiable part
        data_func = odl.solvers.L2NormSquared(sum_ray_trafo.range).translated(noisy_data) *
                            sum_ray_trafo
        reg_param_1 = 8e-3 #First reg param
        reg_func_1 = reg_param_1 * (odl.solvers.L2NormSquared(coarse_discr) *
                                    odl.ComponentProjection(pspace, 0))
        smooth_func = data_func + reg_func_1

        # Non-differentiable part
        reg_param = 8e-3 #Second reg param
        nonsmooth_func = reg_param * odl.solvers.L1Norm(insert_grad.range)

        # Assemble into lists
        comp_proj_1 = odl.ComponentProjection(pspace, 1)
        lin_ops = [insert_grad * comp_proj_1]
        nonsmooth_funcs = [nonsmooth_func]

        box_constr = odl.solvers.IndicatorBox(pspace,
                                              np.min(phantom_f),
                                              np.max(phantom_f))
        f = box_constr

        # eta^-1 is the Lipschitz constant of the smooth functional gradient
        ray_trafo_norm = 1.1 * odl.power_method_opnorm(sum_ray_trafo,
                                                       xstart=phantom, maxiter=2)
        print('norm of the ray transform: {}'.format(ray_trafo_norm))
        eta = 1 / (2 * ray_trafo_norm ** 2 + 2 * reg_param_1)
        print('eta = {}'.format(eta))
        grad_norm = 1.1 * odl.power_method_opnorm(insert_grad,
                                                  xstart=phantom_insert,
                                                  maxiter=4)
        print('norm of the gradient: {}'.format(grad_norm))

        # Define step sizes
        sigma = 4e-3
        tau = 1.0 * sigma

        # Check convergence criterion
        print('tau * sigma * grad_norm ** 2 = {}, should be <= 1'
              ''.format(tau * sigma * grad_norm ** 2))
        assert tau * sigma * grad_norm ** 2 <= 1
        check_value = (2 * eta * min(1 / tau, 1 / sigma) *
                       np.sqrt(1 - tau * sigma * grad_norm ** 2))
        print('check_value = {}, must be > 1 for convergence'.format(check_value))
        convergence_criterion = check_value > 1
        assert convergence_criterion

        #Reconstruct
        reco = pspace.zero()  # starting point
        callback = None
        odl.solvers.forward_backward_pd(
              reco, f=f, g=nonsmooth_funcs, L=lin_ops, h=smooth_func,
              tau=tau, sigma=[sigma], niter=150, callback=callback)

Wouldn't reg_param_1 and reg_param represent the regularization of the different regions? (reg_param_1 for the outside, ie coarse_discr, and reg_param for the insert_grad (over which the gradient is evaluated))? As said, might be that I'm simply misunderstanding the implementation...

kohr-h commented 6 years ago

Oh, I see the issue. Yes, the two regularization parameters apply to the respective regions in the way you think, but they mean different things per region. I the first region, reg_param1 applies to the squared L2 norm of the first component, whereas the second reg_param is in front of the TV functional in the small region. You need to make the two functionals the same type, otherwise they will live on quite different scales, and the weights in front need to be quite different as well, in order to balance their contributions to the total penalty.

davlars commented 6 years ago

A bit confused. I thought what we were doing was straight-forward TV, i.e. doing L2 on the data-matching term, but L1 on the gradient? And you'd do that on "non-multigrid" seutps as well, right?

Anyway: I think I've found the issue: as is now (the code above), we're only really doing regularization on the insert. What we instead should do is evaluate the L1-term (the gradient) in both regions, right? It goes something like this:

         #Gradients on both parts
         insert_grad = odl.Gradient(insert_discr, pad_mode='order1')
         coarse_grad = odl.Gradient(coarse_discr, pad_mode='order1')

        # Differentiable part
        data_func = odl.solvers.L2NormSquared(
            sum_ray_trafo.range).translated(noisy_data) * sum_ray_trafo
        reg_param_1 = 8e-3
        reg_func_1 = reg_param_1 * (odl.solvers.L2NormSquared(coarse_discr) *
                                    odl.ComponentProjection(pspace, 0)) 
        smooth_func = data_func + reg_func_1

        # Non-differentiable part composed with linear operators
        reg_param = 8e-3
        nonsmooth_func = reg_param * odl.solvers.L1Norm(insert_grad.range) 

        reg_param_2 = 8e-6
        nonsmooth_func_coarse = reg_param_2 * odl.solvers.L1Norm(coarse_grad.range) 

        # Assemble into lists (arbitrary number can be given)
        comp_proj_0 = odl.ComponentProjection(pspace, 0)
        comp_proj_1 = odl.ComponentProjection(pspace, 1)

        lin_ops = [insert_grad * comp_proj_1, coarse_grad * comp_proj_0]

        # Column vector of two operators
        #lin_ops = odl.BroadcastOperator(outside_lin_ops, insert_lin_ops)

        nonsmooth_funcs = [nonsmooth_func, nonsmooth_func_coarse]

        box_constr = odl.solvers.IndicatorBox(pspace,
                                              np.min(phantom_f),
                                              np.max(phantom_f))
        f = box_constr

        # eta^-1 is the Lipschitz constant of the smooth functional gradient
        ray_trafo_norm = 1.1 * odl.power_method_opnorm(sum_ray_trafo,
                                                       xstart=phantom, maxiter=2)
        print('norm of the ray transform: {}'.format(ray_trafo_norm))
        eta = 1 / (2 * ray_trafo_norm ** 2 + 2 * reg_param_1)
        print('eta = {}'.format(eta))
        grad_norm_insert = 1.1 * odl.power_method_opnorm(insert_grad,
                                                         xstart=phantom_insert,
                                                         maxiter=4)
        grad_norm_coarse = 1.1 * odl.power_method_opnorm(coarse_grad,
                                                         xstart=phantom_c,
                                                         maxiter=4)
        grad_norm = [grad_norm_insert, grad_norm_coarse]
        print('norm of the gradient: {}'.format(grad_norm))

        # tau and sigma are like step sizes
        sigma = [4e-3, 4e-3] #Use same steps on both
        tau = 1.0 * sigma[0]
        # Check the convergence criterion
        sigma_times_grad_norm = 0
        for i in range(len(sigma)):
            sigma_times_grad_norm += sigma[i] * grad_norm[i]

        print('tau * sigma * grad_norm ** 2 = {}, should be <= 1'
              ''.format(tau * sigma_times_grad_norm ** 2))
        assert tau * sigma_times_grad_norm ** 2 <= 1
        check_value = (2 * eta * min(1 / tau, 1 / sigma[0], 1 / sigma[1]) *
                       np.sqrt(1 - tau * sigma_times_grad_norm ** 2))
        print('check_value = {}, must be > 1 for convergence'.format(check_value))
        convergence_criterion = check_value > 1
        assert convergence_criterion

        #Reconstruct
        callback = None
        odl.solvers.forward_backward_pd(
                reco, f=f, g=nonsmooth_funcs, L=lin_ops, h=smooth_func,
                tau=tau, sigma=sigma, niter=150, callback=callback)

For the above this gives: figure_7

The code might be a bit messy, but doing this we're actually looking at regularization in both parts, right (apologies if the code is a bit messy...)? The only issue above is that we're doing TV on the insert (using insert_discr), then doing TV again (using coarse_discr) on the entire space (on the insert again). Is the more correct way of doing it applying a masking on the coarse_grad perhaps?

kohr-h / odl-multigrid

Multiple regularization #2