ocramz / ad-delcont

Reverse-mode automatic differentiation with delimited continuations
BSD 3-Clause "New" or "Revised" License
13 stars 2 forks source link

Exponential complexity? #3

Open andriusstank opened 3 years ago

andriusstank commented 3 years ago

Simple example with fibonacci numbers:

n :: Int
n = 50

fibN :: Num a => a -> a
fibN x0 = fibs !! n
  where 
    fibs = 0 : x0 : zipWith (+) fibs (tail fibs)

fibInt :: Integer  
fibInt = fibN 1

fibAD :: (Integer, Integer)
fibAD = rad1g 0 1 fibN 1

fibInt evaluates instantly. fibAD takes a long time and throws stack overflow exception.

ocramz commented 3 years ago

Good catch!

On Tue, 20 Jul 2021 at 12:29, Andrius Stankevičius @.***> wrote:

Simple example with fibonacci numbers:

n :: Int n = 50

fibN :: Num a => a -> a fibN x0 = fibs !! n where fibs = 0 : x0 : zipWith (+) fibs (tail fibs)

fibInt :: Integer fibInt = fibN 1

fibAD :: (Integer, Integer) fibAD = rad1g 0 1 fibN 1

fibInt evaluates instantly. fibAD takes a long time and throws stack overflow exception.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ocramz/ad-delcont/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNBDKBDDNKNBHFZGVVSA2TTYVFXRANCNFSM5AVQOBHA .

ocramz commented 3 years ago

I don't know what to make of this though. On one hand, this example is not what I'd call a typical application of AD. But you're right, there's certainly a fundamental limitation in the code somewhere. I hope someone with more expertise in debugging laziness issues can shed some light on this.

turion commented 3 years ago

Yeah, there is some potential for lazyness to build up in some places, e.g. the Num instance (+) = op2Num $ \x y -> (x + y, id, id). If you have a lot of additions you might end up with fibInt = fibN 1 50 = 12586269025 as a thunk 1+ 1 + 1 + ..., which can easily blow up memory.

turion commented 3 years ago

Just replacing it by (+) = op2Num $ \x y -> let sum = x + y in sumseq(sum, id, id) doesn't fix it yet though. But since you're building up thunks of function compositions in the other elements of the tuple, it's hard to prevent a space leak there I'd guess since that can't be easily normalised.

ocramz commented 3 years ago

Thank you @turion . It really sounds like a radical redesign is in order. In its current version, ad-delcont is mostly a straightforward port of the implementation from the Wang article, which they were able to pull off by using Scala which is eagerly-evaluated (IIRC).

andriusstank commented 3 years ago

Somewhat simper example:

forkJoin :: Fractional a => a -> a
forkJoin x = (x+x)/2

forkJoinChain :: Fractional a => a -> a
forkJoinChain x = iterate forkJoin x !! 20

forkJoinChainAD :: Float -> (Float, Float)
forkJoinChainAD x = rad1g 0 1 forkJoinChain x

I don't think lazyness is a problem here. At worst it may cause memory consumption to be proportional to the amount of computation, which shouldn't be an issue with only tens of variables.

My guess would be that this piece of code causes combinatorial explosion:

op2_ zeroa plusa plusb f ioa iob = do
  ra <- ioa
  rb <- iob

ioa and iob are computations, that are evaluated every time they are needed. Each of them has two parents, each of them in turn also has two parents, etc.

Maybe AD variable should also store the value and STRef for gradient along with ContT computation, maybe some flag to avoid repeated evaluations. Like laziness, but manual.

Anyway, I don't really grok the code, so it's all just wild guesses.

tonyday567 commented 3 years ago

Have you tried UseStrict to see if it goes away?

ocramz commented 3 years ago

@tonyday567 thanks but I just tried, -XStrict doesn't help.