sleyzerzon / soar

Automatically exported from code.google.com/p/soar
1 stars 0 forks source link

crash with O_REJECTS_FIRST and DEBUG_MEMORY #45

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Description From Jonathan Voigt 2009-01-08 10:48:44 (-) [reply]
From Bob's email, verbatim:

I've been working on my emo2 branch, which is up-to-date with the latest
changes from the trunk. It works well most of the time, but when I try to 
run
my thesis system with the DEBUG_MEMORY #define turned on (which clears 
memory
returned to the pools), I get a crash under the following conditions:

- this message appears in the trace: "inner elaboration loop finished but 
not
at quiescence."
- the system is trying to assert a preference for this rule:

sp {apply*operator*remove-command
   (state <s> ^operator.actions
              ^io.output-link <ol>)
   (<ol> ^<att> <value>)
   (<value> ^status complete)
-->
   (<ol> ^<att> <value> -)}

Stepping through the code, what appears to happen is this preference (among
others) gets put on thisAgent->newly_created_instantiations during
create_instantiation, as does justification-11 (during 
chunk_instantiation),
the justification associated with the above rule. Going into
assert_new_preferences, the preference for justification-11 is at the head 
of
the newly_created_instantiations list.

assert_new_preferences does a two-pass algorithm: first it goes through the
newly_created_instantiations list and puts any reject preferences 
associated
with these instantiations on an o_rejects list (the instantiations are 
still on
the newly_created_instantiations list). These preferences are then 
processed,
which possibly includes deallocating them. When a preference is 
deallocated,
the associated instantiation may also be deallocated.

Next, the items on the newly_created_instantiations list are processed, and
this is where we get a crash. It appears that the instance associated with
justification-11 got deallocated, and thus the head of the
newly_created_instantiations list is bad. The code apparently expects the
o_reject stuff to still be in this list, since it checks for them and 
ignores
them (presumably since they've already been processed). 

Interestingly, if I turn off DEBUG_MEMORY, then it's the instance 
associated
with the rule that is at the head of the list instead of the justification.
This appears to be a side effect of the deallocation (the address changes 
in
free_with_pool).

Anyway, it's amazing this works at all. I'm not sure if the instance is
supposed to be deallocated or not, but maybe it makes sense, since the 
instance
only has one pref (to remove something), and it's already been processed, 
so we
don't need the instance anymore. In that case, perhaps the right fix is to
splice any instances we deallocate out of the newly_created_instantiations 
list
(although that seems inefficient, as we have to find it in the list first, 
if
it's even there). Another possibility is to delay deallocating the 
preferences
until after all of the instances have been processed. This would mean doing 
a
third pass (albeit just over the o_rejects list).

As this last approach is the simplest, I tried it and it seems to fix the
crash. Unfortunately, it also appears to break init-soar (lots of leaked 
ids
reported, even in the water-jug demo, which doesn't have any 
justifications).

Finally, I don't know how to reproduce the crash in a simple example. It's
possible it's related to the new waterfall changes somehow. Finally, I 
can't
guarantee it isn't related to my emotion code, since I can't run my system
outside of my branch. I'm not sure what next steps to take.
------- Comment #1 From Bob Marinier 2009-01-08 18:19:13 (-) [reply] ------
-
I've worked on this a bit more, here is a clearer description of the issue.

When DEBUG_MEMORY and O_REJECTS_FIRST are #define'd, a crash can occur in
assert_new_preferences. What happens is that the o_reject preferences of 
the
instantiations in thisAgent->newly_created_instantiations are gathered,
processed, and possibly released. If any are released, then the
newly_created_instantiations list is corrupted, and things crash 
immediately
following when we try to walk the list (to assert the non-o-reject prefs). 
When
DEBUG_MEMORY is off, those memory locations are still valid (even though
released
to the pool), and the loop ignores them. With DEBUG_MEMORY on, the memory 
is
overwritten when released to the pool, and thus we get a crash.

The code appears to expect the instantiations associated with reject prefs 
to
still be in newly_created_instantiations list (after all, it tests for them 
and
ignores them). Certainly there isn't any code that tries to remove them 
from
the list.

I've tried various things to fix this:

1) Delay the deallocation of the preferences by adding a ref count.
2) Delay the deallocation of the instantiations. Instantiations don't have 
an
explicit ref count, but I added a flag that said "don't deallocate".
3) Splice the instantiations out of the newly_created_instantiations list 
upon
deallocation. (This one got the closest -- only crashed on exit).

Various implementations of these ideas either still crashed, leaked tons of
memory on init-soar, or both.

I'll post another comment with instructions on how to reproduce (setting up 
the
project is complex).
------- Comment #2 From Bob Marinier 2009-01-08 18:20:17 (-) [reply] ------
-
I mis-wrote above -- the one that only crashed on exit but otherwise worked 
was
#2.
------- Comment #3 From Bob Marinier 2009-01-08 18:45:15 (-) [reply] ------
-
Created an attachment (id=112) [details]
Needed to run the agent.
------- Comment #4 From Bob Marinier 2009-01-08 18:47:30 (-) [reply] ------
-
Created an attachment (id=113) [details]
The soar agent
------- Comment #5 From Bob Marinier 2009-01-08 18:51:22 (-) [reply] ------
-
Created an attachment (id=114) [details]
Files that go in the Soar2D dir
------- Comment #6 From Bob Marinier 2009-01-08 18:52:52 (-) [reply] ------
-
Setting up the project:

To make life easy (i.e., avoid changing a bunch of project settings), set
things up like this:

c:\
  Bob\
    research\
      agents\
        RHSemotion4\ <-- unzip RHSemotion in here
        book-env\
          rl4\ <-- unzip agent in here
    soar-emo2\ <-- checkout the branch here (so this contains the SoarSuite
dir)
  Program Files\boost\boost_1_35_0

1) The branch is: https://winter.eecs.umich.edu/svn/soar/branches/rmarinie-
emo2
2) The RHSemotion project is attached to this bug
3) The agent code is attached to this bug

Building:
4) Confirm DEBUG_MEMORY and O_REJECTS_FIRST are #define'd in kernel.h
5) Build SoarSuite in Debug SCU mode.(with Java, as we'll be using Soar2D)
6) Build the RHSemotion project as a debug-lib. Make sure the .dll ends up 
in
SoarLibrary/bin.

Running:
7) Unzip the Soar2Dfiles in the Soar2D directory.
8) Run do-runs.py to run everything (probably can just run Soar2D directly 
with
the xml config file, but this is the way I always do it). It may be useful 
to
start with a gui and a debugger (gives you a chance to attach Visual Studio 
and
see what the agent is doing).
9) Run the agent. It crashes for me in dc 18. I find it very useful to 
attach
Visual Studio before running and set breakpoints in relevant areas with hit
count triggers (the relevant functions get called a lot, so you don't want 
to
step through all the other dcs). What I do to get the right hitcount is set 
it
really high (like 1000), let it crash, and then check the hit count. 
Whatever
it's at when the crash occurs is what you want to set the hitcount to.
------- Comment #7 From Bob Marinier 2009-05-13 17:41:42 (-) [reply] ------
-
Created an attachment (id=121) [details]
The required map
------- Comment #8 From Jonathan Voigt 2009-05-29 14:20:54 (-) [reply] ----

---
Moved to 9.0.2

Original issue reported on code.google.com by voigtjr@gmail.com on 23 Jul 2009 at 5:09

GoogleCodeExporter commented 8 years ago
Needed to run the agent

Original comment by voigtjr@gmail.com on 23 Jul 2009 at 5:09

Attachments:

GoogleCodeExporter commented 8 years ago
The Soar agent

Original comment by voigtjr@gmail.com on 23 Jul 2009 at 5:09

Attachments:

GoogleCodeExporter commented 8 years ago
Files that go in the Soar2D dir

Original comment by voigtjr@gmail.com on 23 Jul 2009 at 5:10

Attachments:

GoogleCodeExporter commented 8 years ago
The required map

Original comment by voigtjr@gmail.com on 23 Jul 2009 at 5:10

Attachments:

GoogleCodeExporter commented 8 years ago

Original comment by voigtjr@gmail.com on 23 Jul 2009 at 5:29

GoogleCodeExporter commented 8 years ago

Original comment by voigtjr@gmail.com on 23 Feb 2010 at 7:44