lanl / scico

Scientific Computational Imaging COde
BSD 3-Clause "New" or "Revised" License
105 stars 17 forks source link

The curious case of the ASTRA adjoint on GPU #103

Closed Michael-T-McCann closed 2 years ago

Michael-T-McCann commented 3 years ago

The ASTRA Toolbox GPU back projector is known to not be a highly accurate adjoint to the forward operator. In the old days, we handled this with a generous tolerance on our adjoint test. After some improvements to that adjoint test, the ASTRA back projector fails very hard (relative error 0.78).

The question: is this a problem with the test, our interface to ASTRA, or ASTRA itself? If it's the latter, let's just skip these tests on GPU.

First step: get a GPU install of SCICO and run pytest scico/test/linop/test_radon_astra.py to see if the failures occur for you.

lukepfister commented 3 years ago

IIRC it’s a problem with the astra algorithm itself.

I want to say that the forward projector is always pixel based, but the GPU adjoint is ray based? Don’t quote me on that.

smajee commented 3 years ago

So if it is due to the inaccuracy of the astra projector, should we increase the tolerance of the test or just disable them for gpu?

Michael-T-McCann commented 3 years ago

To clarify: we know there is at least a problem with the ASTRA GPU backprojection algorithm itself. The question here why is the new adjoint test failing with a relative error approaching one. I didn't think the ASTRA backprojection was that inaccurate, but I don't have a good intuition for our new adjoint test.

If you are convinced there is no bug in the interface or test, I learn toward increasing the tolerance rather than disabling the test, because it might allow us to catch regressions in the interface in the future.

lukepfister commented 3 years ago

rel err of 1 is larger than i'd expect; iirc it was something like 10%?

has the astra version changed? it looks like they finally shipped v2 (was on 1.9.9 beta for a looong time)

has the jax/jaxlib version changed? might be worth seeing if something has been modified in the host_callback interface; it is marked as "experimental" and subject to breaking changes

smajee commented 2 years ago

Is the adjoint test new? Did it ever pass on GPU for astra?

lukepfister commented 2 years ago

No, it isn't new. It passed with very, very loose tolerances; like a relative error of 20% or something.

Sounds like that's no longer the case (and I have no gpu machine to test)

smajee commented 2 years ago

I did pytests on old commits to see where it broke on GPU. It seems the tests pass on commit d86b51f and fail after commit 656f59d (Improve LinearOperator adjoint tests (#72)).

That commit (656f59d) seems to be the one where a new adjoint test (adjoint_test) was introduced. The old adjoint_AAt_test seems to pass on GPU.

The new test seems to be testing the adjoint accuracy in a different way than the old test. I don't understand the tests well enough to comment on why one passes and the other fails. Any ideas?

Michael-T-McCann commented 2 years ago

Seems clear that what is needed is just a looser tolerance. The tolerance values for the new and old tests mean different things, so this is not too surprising.

On Thu, Nov 18, 2021, 14:29 Soumendu Majee @.***> wrote:

I did pytests on old commits to see where it broke on GPU. It seems the tests pass on commit d86b51f https://github.com/lanl/scico/commit/d86b51f764ba3641354ae255421d43fdd514d0e7 and fail after commit 656f59d https://github.com/lanl/scico/commit/656f59d784b46c9dcc0e4b888443aa4f836ad1f8 (Improve LinearOperator adjoint tests (#72 https://github.com/lanl/scico/pull/72)).

That commit seems to be the one where a new adjoint test (adjoint_test) was introduced. The old adjoint_AAt_test seems to pass on GPU.

I don't understand the tests well enough to comment on where why one passes and the other fails. Any ideas?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lanl/scico/issues/103#issuecomment-973286954, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANUBO7CNQY46JLAD4H4E7JTUMVV4RANCNFSM5IFZBO5Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

smajee commented 2 years ago

That sounds like a reasonable solution to me.

It seems currently all the astra tests use the same tolerance irrespective of the test. I found that I can make the test pass by making the tolerance of adjoint_test 15 times. I have pushed the change in a (draft) pull request. Let me know your comments on the pull request.