Open mo-joshuacolclough opened 1 month ago
Example of the proposed solution on a patched test:
! Calculate inner0 = x_innerproduct_x( activeX ) + x_innerproduct_x( activeY )
! Perform TL forwards
! Calculate inner1 = x_innerproduct_x( activeX ) + x_innerproduct_x( activeY )
! ...
MachineTol = SPACING(MAX(ABS(inner0), ABS(inner1)))
relative_diff = ABS(inner0 - inner1) / MachineTol
if (relative_diff <= overall_tolerance) then
WRITE(log_scratch_space, *) "FAILED finicky_kernel_type: TL does not have &
&enough influence to ensure failure is detected. ", inner0, inner1, relative_diff
call log_event(log_scratch_space, log_level_error)
end if
! Go on to perform AD, and AD/TL comparison <AMx, x> == <Mx, Mx>
Result for the problematic kernel:
ERROR: FAILED finicky_kernel_type: TL does not have enough influence to ensure failure is detected. 10085.7506053876 10085.7506053876 31.0000000000000
Description
During testing of the adjoint routines in the LFRic adjoint model, it was found that some kernels passed with false-positives due to the TL routine not having enough influence on the dot product result. In one example, we found that the TL call changed the dot product by
31 * machine tolerance
, which is far below the detection range of the test (overall_tolerance = 1500 * machine tolerance
).The reason for this small influence on the dot product was due to the nature of the kernel - it takes a linearisation state field, which is automatically set up with random values between [0.0, 1.0]. Scaling this field resulted in the detection of the TL/AD mismatch, as it increased the influence of the TL/AD routines on the dot product result.
Proposal
The solution to the underlying problem relies on knowledge of the specific kernel (how the kernel uses the ls field), therefore it is hard to suggest a fix for the underlying issue - how to initialise the linearisation state fields to work "nicely" with a given kernel.
Instead this problem could be detected automatically and result in a test failure:
tl_diff_relative_to_machine_tol
if ( tl_diff_relative_to_machine_tol <= overall_tolerance ) panic()!!
Then a patch can be made for that specific kernel test to scale the inputs appropriately.
(Tagging @DrTVockerodtMO for visibility).