odlgroup / odl

Operator Discretization Library https://odlgroup.github.io/odl/
Mozilla Public License 2.0
370 stars 105 forks source link

OperatorTest errors #1323

Open dksim opened 6 years ago

dksim commented 6 years ago

It appears the odl.diagnostics.OperatorTest class has a few bugs

Examples:

>>> import odl
>>> space = odl.uniform_discr([0,0],[1,1],[5,5])
>>> geometry = odl.tomo.parallel_beam_geometry(space, num_angles = 2, det_shape= 2)
>>> ray_trafo = odl.tomo.operators.RayTransform(space, geometry)

1. Failure to estimate norm (raises ValueError)

>>> optest = odl.diagnostics.OperatorTest(ray_trafo)

2. Linear operator fails derivative and adjoint tests

>>> ray_trafo.is_linear
True
>>> opnorm = ray_trafo.norm(estimate=True)
>>> optest = odl.diagnostics.OperatorTest(ray_trafo, operator_norm=opnorm)
>>> optest._derivative_convergence

Verifying that derivative is a first-order approximation x=One p=Gaussian , error=0.0006087356400996563 x=One p=Grad 0 , error=0.00024304196855373788 x=One p=Grad 1 , error=0.00024307337744078404 x=One p=Grad all , error=0.00048608393545274375 x=Cube p=One , error=0.00026358338048604085 x=Cube p=Grad 0 , error=0.00024304196524427385 x=Cube p=Grad all , error=0.00017199506603023416 x=Gaussian p=One , error=0.00026358338048604085 x=Gaussian p=Gaussian , error=0.0003335309723047023 x=Gaussian p=Grad 0 , error=0.00024304196524427385 x=Gaussian p=Grad 1 , error=0.00024307337413134527 x=Gaussian p=Grad all , error=0.0007702715463741245 x=Grad 0 p=One , error=0.0002635833800723579 x=Grad 0 p=Grad 0 , error=0.00038513577318706227 x=Grad 0 p=Grad 1 , error=0.00038510436430001614 x=Grad 0 p=Grad all , error=0.0001420938050470074 x=Grad 1 p=One , error=0.0002635833800723579 x=Grad 1 p=Gaussian , error=0.00029464676943609776 x=Grad 1 p=Grad all , error=0.00017199506561655118 x=Grad all p=Grad 0 , error=0.00024304196524427385 x=Grad all p=Grad 1 , error=0.00024307337413134527 x=Grad all p=Grad all , error=0.0001420938050470074 error = inf_c ||A(x+cp)-A(x)-A'(x)(cp)|| / c FAILED 22 TEST CASE(S)

>>> optest.adjoint()

... FAILED 48 TEST CASE(S)

The adjoint seems to be scaled according to: (x, A^T y) / (Ax, y) = 1.5983534838635414. Should be 1.0

3. For non-linear operators there shouldn't be test for adjoints or calculation of norm

4. The following check may fail in some cases where it shouldn't (deriv is self.operator will only work if both are referencing the same object in memory)

if self.operator.is_linear and deriv is self.operator:
            self.log('A is linear and A.derivative is A')
            return
kohr-h commented 6 years ago

Thanks for your bug report @dksim. Clearly the diagnostics module is not on par with the rest of the library.

Regarding your points:

  1. I'm not sure what's going on exactly, but the Operator.norm interface is relatively new, see #1065, so OperatorTest may not be up to date with that. In any case, can you give some more details?
  2. This seems like an issue with too small step sizes being tested. Which ray trafo backend do you use? If ASTRA, then the issue could simply be that the ray transform is computed in single precision while inputs and outputs are double precision. In that case there's nothing we can do.
  3. That's true, in particular in cases like #1324 where the adjoint check is expected to fail.
  4. I don't see anything failing here. It's just a message that applies to a trivial case.

Related issues: #551, #717

dksim commented 6 years ago

Thanks for the response @kohr-h. Upon further inspection and based on your comments, here is an updated status

  1. The ValueError is raised in some cases, depending on space, geometry and backend. The example I gave is with astra_cuda backend and results in the following error message

ValueError Traceback (most recent call last)

in () ----> 1 optest = odl.diagnostics.OperatorTest(ray_trafo) ~\Documents\odl\odl\diagnostics\operator.py in __init__(self, operator, operator_norm, verbose, tol) 51 self.verbose = False 52 if operator_norm is None: ---> 53 self.operator_norm = self.norm() 54 else: 55 self.operator_norm = float(operator_norm) ~\Documents\odl\odl\diagnostics\operator.py in norm(self) 86 operator_norm = max(power_method_opnorm(self.operator, maxiter=2, 87 xstart=x) ---> 88 for name, x in samples(self.operator.domain) 89 if name != 'Zero') 90 ~\Documents\odl\odl\diagnostics\operator.py in (.0) 87 xstart=x) 88 for name, x in samples(self.operator.domain) ---> 89 if name != 'Zero') 90 91 self.log('Norm is at least: {}'.format(operator_norm)) ~\Documents\odl\odl\operator\oputils.py in power_method_opnorm(op, xstart, maxiter, rtol, atol, callback) 248 x_norm = x.norm() 249 if x_norm == 0: --> 250 raise ValueError('reached ``x=0`` after {} iterations'.format(i)) 251 if not np.isfinite(x_norm): 252 raise ValueError('reached nonfinite ``x={}`` after {} iterations' ValueError: reached ``x=0`` after 0 iterations
  1. It is true that this issue is affected by the backend. For example, using the skimage backend the derivative test passes successfully (but the adjoint test still fails). I thought astra_cuda uses double precision for a GPU that supports it but I might be mistaken. In any case, I see a problem with numerical stability here.
  2. Nothing to add here
  3. What I meant is that the if condition might fail even for a linear operator (which has as a derivative itself) and not return the function at that point, which may be harmless but not what was intended.
kohr-h commented 6 years ago
  1. The zero element shouldn't be used for operator norm computation. In general it's hard to come up with a good starting point for the power iteration, so this was an attempt to test at least a somewhat broad range. Maybe we should just catch the exception and go on. Or we don't try all examples and just use the default initialization (random) which seems to work well in all cases I've seen so far.
    1. I'm pretty sure ASTRA uses single precision internally. This issue is out of our hands, and I don't see a particular problem, except that we might want to document somewhere that the backend uses single precision no matter what you choose in ODL. If you see a problem with stability, you may want to head over to ASTRA's issue tracker and raise the issue there.
    2. I think the current behavior makes sense. If the operator is linear there are two possible cases:
      • No override for derivative is implemented in the operator, in which case the default implementation just returns self, and the condition is True. This is the correct behavior for 99 % of the cases.
      • For some odd reason, somebody decides to write an override for derivative of a linear operator that does not return self. I have a hard time imagining what that reason might be. In this case, the condition evaluates to False and the derivative is being tested.
dksim commented 6 years ago
  1. Yes, adding some documentation or a warning would be useful. In my understanding the OperatorTest class serves to help a user check if the implementation of an operator derivative, adjoint etc. is correct. If this is not reliable (even if this is due to external libraries), then this probably causes confusion instead of fulfilling its purpose and the user should be at least warned.
  2. OK, I understand now the logic and this is probably fine. Thanks for clearing that up.
kohr-h commented 6 years ago

Yes, adding some documentation or a warning would be useful. In my understanding the OperatorTest class serves to help a user check if the implementation of an operator derivative, adjoint etc. is correct. If this is not reliable (even if this is due to external libraries), then this probably causes confusion instead of fulfilling its purpose and the user should be at least warned.

I agree to your general sentiment that failing diagnostics can be confusing if they just reflect expected behavior (from the implementation point of view).

Let me just add that the typical user who makes use of that functionality is probably an advanced user who likes to get some quick correctness test for a custom operator they implemented. In that scenario, the diagnostics typically give a much larger error initially if there's a real issue in the implementation, more like in the order of 1. After fixing the bugs, the error may go down to single precision error. Since the user likely has a good understanding of the implementation details, they can tell if a failure is more likely due to loss of precision in one of the steps, or due to another, more subtle bug.

The diagnostics are less useful as a tool for somebody who browses the library and tests some random operator from the shelf. That said, we should definitely clarify the intention of this subpackage, and explain how to interpret results.

Also note that the linearity failure you observed was in an internal method that would not have been called by the public OperatorTest.derivative method.

If you run the adjointness tests you will see similar things with the ray transform. Adjointness is much harder to get correct, and with external libraries there's just nothing you can do but hope that they fix the issue for you.