opennars / Narjure

A Clojure implementation of the Non-Axiomatic Reasoning System proposed by Pei Wang.
GNU General Public License v2.0
43 stars 11 forks source link

Memory 'leak #45

Closed TonyLo1 closed 8 years ago

TonyLo1 commented 8 years ago

when running with continuous input, memory use increases constantly. Been over 25gbs on my machine - no sign of it levelling off.

Tried various tests: disable concept generation disable event generation

No difference - just a little slower increase in memory usage had a look with visualVM profiler but not familiar enough for it to be useful

TonyLo1 commented 8 years ago

This only appears to be an issue when processing new input. The general inference cycle does not cause continuous growth

TonyLo1 commented 8 years ago

May have found the culprit. 96% of memory is allocated to the Animation thread!

Is that used in lense?

patham9 commented 8 years ago

Yes it is. Probably quil has a memory issue, we observed a similar thing with Processing in the old version. you may check whether it also happens when running without GUI.

patham9 commented 8 years ago

And also in case that the leak also occurs without GUI (which I guess that it won't), we can ignore it for now, finishing the design is for now more important than making sure that it can run for longer than hours.

TonyLo1 commented 8 years ago

it's not a slow leak it's a fast one. Over 10gb in about 5 minutes.

patham9 commented 8 years ago

Did you try whether the leak also happens without lense? Input is also possible by starting core and sending the string message to sentence parser from REPL: (cast! (whereis :sentence-parser) [:narsese-string-msg "<a --> b>."])

I can continue this evening after the two exams, then I can have a look at it.

TonyLo1 commented 8 years ago

Yes, I ran it from core (without lense) but I don't know what effect the debuglogger statements have on memory usage. Still seems to leak a lot. But it can wait for now.

patham9 commented 8 years ago

It just manages an atom up to a max. size. So this can't be the issue. The body of the debuglogger can be commented in order to have a look, but I guess the leak is in the actors. (Quasar)

patham9 commented 8 years ago

but it's strange that only input causes this, you might try calling parse2 in a loop, to make sure that the leak isn't in the parser.

0xc1c4da commented 8 years ago

it could very well be the parser if it's new instantiations of it

TonyLo1 commented 8 years ago

ok, running this code: `(defn parse2-test [](let [sentence %28parse2 %28format "<%s-->%s>.:|10|:" %28rand-nth ["a" "b" "c" "d" "e" "f" "g" "h"]) (rand-nth ["a" "b" "c" "d" "e" "f" "g" "h"])))] ;do nothing ))

(while true (parse2-test))`

Does NOT demonstrate leak, so unlikely it's parse2.

TonyLo1 commented 8 years ago

for some reason the code wont display correctly, here is gist link:

https://gist.github.com/TonyLo1/f4051f5d7d2e888ef89c98aee86d809e

rasom commented 8 years ago

@TonyLo1 give it name with .clj extension

rasom commented 8 years ago

@TonyLo1 memory leak can be caused by some more complex examples of input, this test doesn't cover all cases

TonyLo1 commented 8 years ago

This is same input as core.clj uses to generate leak.

patham9 commented 8 years ago

Ok I had some time now to isolate this issue systematically.

Results: 1.: It has nothing to do with lense since lense was not even started in my test. 2.: It has nothing to do with parser, since in my test just one time a Narsese string is parsed. 3.: It has to do with (cast! (whereis :task-creator) [:sentence-msg sentence])) 3.1.: It doesn't have to do with the message passing to task-creator since when the message-handling is empty, the leak does not occur altough the message is sent as before. 3.2./Conclusion for now: So the issue has to do with the further consequences/processing the new task causes in the system.

One million inputs will approximately create 500MB additional RAM consumption, meaning half a megabyte per input. ^^ This is fine for now altough not very beautiful and imcompatible with keeping the system running for multiple hours (dependent on aviable ram) in for example a rover scenario, here is the statement I've used for testing: (let [sentence (parse2 "<tim --> cat>.")] (doseq [u (range 1000000)] (cast! (whereis :task-creator) [:sentence-msg sentence])))

patham9 commented 8 years ago

Seth said that the issue may lie in the unbounded mailbox. But I don't think this can be the issue, given that the memory stays full even though the input sequence was passed in minutes ago, meaning the system had enough time to process them all. The only case where this processing time wouldn't ever be enough is if there is at least one message per input which is part of an infinitely circling message group in the system, meaning there is at least one more message in the system by every input which "stays". The shortest loop would be having an actor which sends a message to itself when processing the message type it sends. We have to check whether this is really not happening at any place, like in local inference where it could dependent on the implementation easily happen.

(side note: setting a maximum mailbox size for the actors is not a good idea since the system-internal messages are crucial, only valid thing is to have task-creator having a fixed size bag buffer from which it just enters a maximum amount per time (another tick timer required here) potentially forgetting low-priority items in this process, the design needs to be such that mailboxes are bounded by this alone already given slow enough timer tick)

TonyLo1 commented 8 years ago

I looked into the mailbox possibility yesterday. It is possible for mailboxes to grow indefinitely if the load is not balanced across the actors correctly. However, this is not the case here as reducing the scheduler time interval would have removed this issue and it did not.

On the wider issue of using a mailbox-size policy, I agree that it is not ideal to restrict the size, but It can be managed without losing messages (it would affect timing though). However, gen-servers can't use the args available to standard actors, nor can the supervisor use mailbox-size params.

It is possible that seth's comments about the instrumentation support could be adding overhead for each call. May be worth turning off some of the checks.

patham9 commented 8 years ago

I don't agree, if the scenario I described happens it doesn't matter what the sheduling interval will be and the message amount would still grow.

To the max mailbox size: It's not an issue that we can't apply it, we should have a task creator tick and having task-creator bag-buffering its incoming tasks anyway.

TonyLo1 commented 8 years ago

we're not talking about the same things :) And I agree that using a bag in task-creator is an option for one form of load balancing.

No need for an additional timer here, as the system-time-tick is sufficient. each tick, n tasks are selected from the task-creator bag and posted to dispatcher.

This doesn't resolve the leak though, just helps with load balancing.

TonyLo1 commented 8 years ago

A better design would be to have two load reducers:

  1. processing new input
  2. processing derived results

so two bags which allow n selections per system-time-tick to be passed to task creator. System throughput is then controlled by two parameters n1, n2 in the respective load reducers.

patham9 commented 8 years ago

I agree, let's do it this way.

TonyLo1 commented 8 years ago

Ok, I'll make the change now. Do you need to commit anything?

patham9 commented 8 years ago

Cool. Not currently, I will work on the termlinks though but I will tell you when I will put it in so that we can coordinate. (And let's ignore the memory leak for now, and finish the design, because one million inputs is fine for whatever demo we will show in July)

TonyLo1 commented 8 years ago

ok, sound good. Here is link to updated design (to check we are talking the same language :) )

https://creately.com/diagram/io495zy71/QECAuWUHTXIfmbr0Cz59ulOaASg%3D

TonyLo1 commented 8 years ago

memory leak seems to be resolved with load-reducer :)

Don't understand why though!

patham9 commented 8 years ago

analysis can follow in the future, closed for now.