When opfunc() simply returns the output state variable of a nn model (ie. when opfunc() simply returns my_net:forward()'s output), the second opfunc() call within the for loop updates not only C2, but also C1. In this case, dC_est is wrongly 0. Avoid this behaviour by blindly copying C1 contents when it is a Tensor/CudaTensor. The overhead should be bearable as C1 is a scalar.
When opfunc() simply returns the output state variable of a nn model (ie. when opfunc() simply returns my_net:forward()'s output), the second opfunc() call within the for loop updates not only C2, but also C1. In this case, dC_est is wrongly 0. Avoid this behaviour by blindly copying C1 contents when it is a Tensor/CudaTensor. The overhead should be bearable as C1 is a scalar.