Closed TonyLo1 closed 8 years ago
This only appears to be an issue when processing new input. The general inference cycle does not cause continuous growth
May have found the culprit. 96% of memory is allocated to the Animation thread!
Is that used in lense?
Yes it is. Probably quil has a memory issue, we observed a similar thing with Processing in the old version. you may check whether it also happens when running without GUI.
And also in case that the leak also occurs without GUI (which I guess that it won't), we can ignore it for now, finishing the design is for now more important than making sure that it can run for longer than hours.
it's not a slow leak it's a fast one. Over 10gb in about 5 minutes.
Did you try whether the leak also happens without lense?
Input is also possible by starting core and sending the string message to sentence parser from REPL:
(cast! (whereis :sentence-parser) [:narsese-string-msg "<a --> b>."])
I can continue this evening after the two exams, then I can have a look at it.
Yes, I ran it from core (without lense) but I don't know what effect the debuglogger statements have on memory usage. Still seems to leak a lot. But it can wait for now.
It just manages an atom up to a max. size. So this can't be the issue. The body of the debuglogger can be commented in order to have a look, but I guess the leak is in the actors. (Quasar)
but it's strange that only input causes this, you might try calling parse2 in a loop, to make sure that the leak isn't in the parser.
it could very well be the parser if it's new instantiations of it
ok, running this code: `(defn parse2-test [](let [sentence %28parse2 %28format "<%s-->%s>.:|10|:" %28rand-nth ["a" "b" "c" "d" "e" "f" "g" "h"]) (rand-nth ["a" "b" "c" "d" "e" "f" "g" "h"])))] ;do nothing ))
(while true (parse2-test))`
Does NOT demonstrate leak, so unlikely it's parse2.
for some reason the code wont display correctly, here is gist link:
https://gist.github.com/TonyLo1/f4051f5d7d2e888ef89c98aee86d809e
@TonyLo1 give it name with .clj extension
@TonyLo1 memory leak can be caused by some more complex examples of input, this test doesn't cover all cases
This is same input as core.clj uses to generate leak.
Ok I had some time now to isolate this issue systematically.
Results:
1.: It has nothing to do with lense since lense was not even started in my test.
2.: It has nothing to do with parser, since in my test just one time a Narsese string is parsed.
3.: It has to do with (cast! (whereis :task-creator) [:sentence-msg sentence]))
3.1.: It doesn't have to do with the message passing to task-creator since when the message-handling is empty, the leak does not occur altough the message is sent as before.
3.2./Conclusion for now: So the issue has to do with the further consequences/processing the new task causes in the system.
One million inputs will approximately create 500MB additional RAM consumption, meaning half a megabyte per input. ^^ This is fine for now altough not very beautiful and imcompatible with keeping the system running for multiple hours (dependent on aviable ram) in for example a rover scenario, here is the statement I've used for testing:
(let [sentence (parse2 "<tim --> cat>.")] (doseq [u (range 1000000)] (cast! (whereis :task-creator) [:sentence-msg sentence])))
Seth said that the issue may lie in the unbounded mailbox. But I don't think this can be the issue, given that the memory stays full even though the input sequence was passed in minutes ago, meaning the system had enough time to process them all. The only case where this processing time wouldn't ever be enough is if there is at least one message per input which is part of an infinitely circling message group in the system, meaning there is at least one more message in the system by every input which "stays". The shortest loop would be having an actor which sends a message to itself when processing the message type it sends. We have to check whether this is really not happening at any place, like in local inference where it could dependent on the implementation easily happen.
(side note: setting a maximum mailbox size for the actors is not a good idea since the system-internal messages are crucial, only valid thing is to have task-creator having a fixed size bag buffer from which it just enters a maximum amount per time (another tick timer required here) potentially forgetting low-priority items in this process, the design needs to be such that mailboxes are bounded by this alone already given slow enough timer tick)
I looked into the mailbox possibility yesterday. It is possible for mailboxes to grow indefinitely if the load is not balanced across the actors correctly. However, this is not the case here as reducing the scheduler time interval would have removed this issue and it did not.
On the wider issue of using a mailbox-size policy, I agree that it is not ideal to restrict the size, but It can be managed without losing messages (it would affect timing though). However, gen-servers can't use the args available to standard actors, nor can the supervisor use mailbox-size params.
It is possible that seth's comments about the instrumentation support could be adding overhead for each call. May be worth turning off some of the checks.
I don't agree, if the scenario I described happens it doesn't matter what the sheduling interval will be and the message amount would still grow.
To the max mailbox size: It's not an issue that we can't apply it, we should have a task creator tick and having task-creator bag-buffering its incoming tasks anyway.
we're not talking about the same things :) And I agree that using a bag in task-creator is an option for one form of load balancing.
No need for an additional timer here, as the system-time-tick is sufficient. each tick, n tasks are selected from the task-creator bag and posted to dispatcher.
This doesn't resolve the leak though, just helps with load balancing.
A better design would be to have two load reducers:
so two bags which allow n selections per system-time-tick to be passed to task creator. System throughput is then controlled by two parameters n1, n2 in the respective load reducers.
I agree, let's do it this way.
Ok, I'll make the change now. Do you need to commit anything?
Cool. Not currently, I will work on the termlinks though but I will tell you when I will put it in so that we can coordinate. (And let's ignore the memory leak for now, and finish the design, because one million inputs is fine for whatever demo we will show in July)
ok, sound good. Here is link to updated design (to check we are talking the same language :) )
https://creately.com/diagram/io495zy71/QECAuWUHTXIfmbr0Cz59ulOaASg%3D
memory leak seems to be resolved with load-reducer :)
Don't understand why though!
analysis can follow in the future, closed for now.
when running with continuous input, memory use increases constantly. Been over 25gbs on my machine - no sign of it levelling off.
Tried various tests: disable concept generation disable event generation
No difference - just a little slower increase in memory usage had a look with visualVM profiler but not familiar enough for it to be useful