Enzyme is fast, works on GPU's, and allows mutation. This branch will try to move Comrade to solely use Enzyme for all its AD needs moving forward. This will involve changing a number of things:
[x] Make Enzyme work with all geometric models and add rules when needed
[ ] Make Comrade non-allocating again! Since Enzyme supports and encourages mutation, we will ensure that all calls to visibility are non-allocating.
[ ] Add rules for FFT, NFFT, DFT
[ ] Make modelimage non-allocating and work with Enzyme
[x] Make JonesCache and the Jones matrix multiplies work with Enzyme (sparse matrix issues?)
[ ] Polarization non-allocating and work with Enzyme
This is not going to be quick. But I have done some preliminary testing on geometric models and it looks like Enzyme should work. For example with for the posterior in black_hole_image.jl we get the following results:
Current Comrade 0.7.1:
Forward Diff
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 232.671 μs … 2.651 ms ┊ GC (min … max): 0.00% … 84.44%
Time (median): 243.484 μs ┊ GC (median): 0.00%
Time (mean ± σ): 283.548 μs ± 252.489 μs ┊ GC (mean ± σ): 12.69% ± 12.27%
█▃ ▁
███▆▃▄▁▃▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▇█▇ █
233 μs Histogram: log(frequency) by time 1.99 ms <
Memory estimate: 1.19 MiB, allocs estimate: 1301.
Zygote ( So slow :-1: )
BenchmarkTools.Trial: 1874 samples with 1 evaluation.
Range (min … max): 1.832 ms … 11.902 ms ┊ GC (min … max): 0.00% … 68.71%
Time (median): 2.056 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.666 ms ± 1.966 ms ┊ GC (mean ± σ): 20.68% ± 20.09%
█▇▇▅▃▂
███████▆▄▄▄▄▁▁▄▄▄▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅▇▅▇█▅▇█████ █
1.83 ms Histogram: log(frequency) by time 9.69 ms <
Memory estimate: 4.66 MiB, allocs estimate: 31776.
Note that this is only a 10-dimensional model, so we would expect Forward-mode AD systems to win here.
At the end of this pull request we should potentially see some pretty nice speed-ups. Additionally, the code will be non-allocating, which should allow us in another pull request to thread more efficiently.
Enzyme is fast, works on GPU's, and allows mutation. This branch will try to move Comrade to solely use Enzyme for all its AD needs moving forward. This will involve changing a number of things:
This is not going to be quick. But I have done some preliminary testing on geometric models and it looks like Enzyme should work. For example with for the posterior in
black_hole_image.jl
we get the following results:Current Comrade 0.7.1:
Forward Diff
Zygote ( So slow :-1: )
Enzyme Reverse
Enzyme Forward
Note that this is only a 10-dimensional model, so we would expect Forward-mode AD systems to win here.
At the end of this pull request we should potentially see some pretty nice speed-ups. Additionally, the code will be non-allocating, which should allow us in another pull request to thread more efficiently.