taoensso / timbre

Pure Clojure/Script logging library
https://www.taoensso.com/timbre
Eclipse Public License 1.0
1.44k stars 171 forks source link

Don't retain the head of a sequence when profiling #56

Closed kyptin closed 10 years ago

kyptin commented 10 years ago

I have a large sequence that cannot fit in memory. I also have a function that takes this sequence as an argument and reduces it to a value. So, with names changed to protect the innocent, my code looks like:

(defn process [items]
  (reduce (fn [count item] (inc count)) 0 items))

This version works fine because reduce consumes the sequence without retaining the head. (This claim is not obvious, but it is true, at least in my case.)

The problem is, if I replace defn with defnp as defined by taoensso.timbre.profiling, I now have a function that retains the head of the sequence, with expanded code equivalent to the following:

(defn process [items]
  (pspy* "process"
         (fn []
           (reduce (fn [count item] (inc count)) 0 items))))

In this version, the reduce starts consuming the sequence, but the enclosing process function retains the head. As a result, I get an OutOfMemoryError instead of an answer.

For more on head retention, see this thread and this SO question.

Regarding a solution, my best guess would be to change pspy so that it used Clojure metadata instead of wrapping code in a function. I'm not sure this would be compatible with the other features of taoensso.timbre.profiling, but my guess is that there's some way to make that work.

kyptin commented 10 years ago

I'm not sure I understand why, but I did have success with the following approach, which uses code from pspy in my function:

(defn process
  [items]
  (let [id "process"
        t0 (System/nanoTime)]
    (try
      (reduce (fn [count item] (inc count)) 0 events)
      (finally
        (let [t-elapsed (- (System/nanoTime) t0)]
          (swap! p/*pdata* #(assoc % id (conj (% id []) t-elapsed))))))))

The main difference, I think, is that there is no inner anonymous function with this approach.

I hope that this approach could serve as the basis for a defnp macro that doesn't require a totally different mechanism for storing timing information.

ptaoussanis commented 10 years ago

Hi Jeff,

p/defnp shouldn't themselves interfere with any lazy sequences. Best bet would be if you could get me some kind of minimal example that reproduces the behaviour so I can take a look?

What makes you suspect that something's retaining a head? Just the OOM error?

Depending on what you've got wrapped for profiling, keep in mind that profiling itself involves quite a bit of data collection and could contribute to (or by itself cause) an OOM if you're already operating near your memory limits and/or if the way things have been wrapped means that the data collection doesn't get a chance to terminate regularly.

In particular, note that anything wrapped in a p (or variant) will get times conj'ed into a vector until the calling profile scope ends. This was a performance tradeoff so that data collection is cheaper (aggregate stats are only calculated once on scope termination) - but it does mean that there's an increasing memory overhead for every wrapped p call in a loop, etc.

An alternative would be to gather cumulative stats on each wrapped p call, which'd limit memory use at the cost of increased times.

Or (this occurs to me now), we could strike a balance between the two and gather the cumulative stats every 10k calls, say. I've never seen a case of this being an issue though, so we should confirm first: 1) That this is actually the cause of the OOM, and 2) That your p wrappings are actually tracking what you're expecting them to be tracking (i.e. let's rule out unintentional recursion, etc.).

So, yeah - try get me a reproducible case if you can and let's go from there. Cheers! :-)

kyptin commented 10 years ago

OK, here's a REPL example that demonstrates the problem:

user=> (require '[taoensso.timbre.profiling :refer [profile defnp]])
nil
user=> (defn process [items] (reduce (fn [count item] (inc count)) 0 items))
#'user/process
user=> (defnp process2 [items] (reduce (fn [count item] (inc count)) 0 items))
#'user/process2
user=> (process (range 1e8))
100000000
user=> (profile :info :_ (process2 (range 1e8)))

OutOfMemoryError Java heap space  clojure.core/range (core.clj:2653)
user=> 

The only difference between the two examples the profiling, so I think profiling is causing the issue.

I thought about the vector of times. That could also cause problems, but I think it doesn't in my case. (I have about 5 functions that are each being called about 3 million times. That's a lot, but probably not too much for 3G of heap space.) I had thought about a streaming approach to computing the statistics, but I like your "chunked" approach better. I'd love to see this change, too, although it isn't as pressing for me at the moment.

Thanks for your help, and for timbre in general.

ptaoussanis commented 10 years ago

This is quite interesting! Have reduced it further:

((fn [coll]         (reduce (fn [n x] (inc n)) 0 coll))   (range 1e8)) ; fine (~6s)
((fn [coll] ((fn [] (reduce (fn [n x] (inc n)) 0 coll)))) (range 1e8)) ; OOMs

This is reproducible with (at least) Clojure 1.6.0-RC1 and 1.4.1.

So it's indeed the additional fn wrapping that's causing the problem, though it's not obvious to me why that would be the case.

UPDATE: In hindsight it's actually quite simple - the inner fn is realising the full range but the calling fn is still retaining a reference which the JVM can't release. Will push a fix shortly.

Thanks for your help, and for timbre in general.

You're very welcome, thank you for saying so! Appreciate the bug reports :-)

Cheers!

ptaoussanis commented 10 years ago

Okay, just pushed v3.1.6 with the fix.

kyptin commented 10 years ago

Thanks very much—for the fix and the timeliness! This is a big help for me.

I'll file an issue for the periodic stats calculations, to keep a bound on the size of the times arrays in *pdata*, as we discussed above. As I said, I'm not currently affected by this, so as far as I'm concerned it's not urgent. But I think it is an important step in the evolution of the profiling code, in order to avoid introducing OOMs.

ptaoussanis commented 10 years ago

No problem, am happy you caught this.

I'll file an issue for the periodic stats calculations, to keep a bound on the size of the times arrays in pdata, as we discussed above.

Hmm - I'm on the fence about that one. Brought it up since I suspected it may have been the cause of the error you were seeing. After ruling it out, I'm not convinced it'd be a beneficial change in practice.

The memory overhead we're looking at is ~8 bytes per unterminated p call. In my experience most profiling jobs are measuring on the order of thousands to millions of calls. One normally profiles expensive code, so more than millions of calls for one job is unlikely. Anyway let's say 10m on the upper end - so we're talking ~80MB of memory that'll get GC'd on termination.

If the chunked stats change were completely free I'd say why not - but here it'd involve a reasonable amount of code modding and extra complication. Not sure if that tradeoff makes sense.

You're welcome to open an issue (and offer a counter argument if you have one) - it's certainly worth thinking about! What are your thoughts?

kyptin commented 10 years ago

Sorry, I already filed the issue as #57. I'll reply to your comments there.