sleyzerzon / soar

Automatically exported from code.google.com/p/soar
1 stars 0 forks source link

proposal for a simplified SML #72

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Currently I see several problems with SML:

1. It's complex. It maintains a synchronized version of the input and
output links, generates dozens of events that would never be used when
implementing environments, does magical things with runtime loading of
shared libraries, etc.

2. Bugs continue to be discovered, and since no one in the lab really
understands all of the code (due to 1.), they're costly to fix.

3. The core Soar kernel code is tightly coupled with it.

4. It's multi-threaded and this confuses profilers and debuggers. Combined
with 3, this makes it hard to profile Soar accurately. Debugging is usually
not so bad, but sometimes stack traces in callbacks are impossible to obtain.

5. The abstraction that it tries to achieve is not really carried through.
For example, input and output link WME objects look the same, but you can't
link them to each other. Also, if the agent creates a wme on the input
link, the SML client won't see it. These kinds of lapses from the
advertised abstraction are not understood by new users and they end up
sinking a lot of time into debugging the wrong code.

6. The Soar Java Debugger uses SML, and this seems to account for a bulk of
the required functionality. For example, most of the events that SML allows
the user to catch with callbacks would never be used when creating task
environments.

It seems to me that it should not require such a large library to
facilitate communication between Soar agents and task environments. 99% of
what is needed is just communicating WM state. So I'm proposing an
alternate interface for Soar:

1. The agent will run as a single-threaded stand-alone process, and the
environment will run in a separate process. There won't be any Kernel
objects or schedulers.

2. All communication occurs through a two-way channel with text protocols,
for example a BSD socket. There is no shared memory interface like in SML.
I doubt the amount of time spent marshalling and unmarshalling WM changes
would be significant, at least for the kinds of tasks I've seen people in
the lab work on. To avoid network overhead, Unix domain sockets or named
pipes can be used.

3. The basic cycle will go like this: Soar will read input-link changes
from the environment at the beginning of the input phase. After the output
phase, Soar will send the entire contents of the output link to the
environment. It will then read any status messages the environment sends
regarding the output link commands. That's it.

4. No WM synchronization. The environment is responsible for maintaining
its own state and translating state changes to WM changes in task-specific
ways. Soar will send the entire contents of its output link to the
environment every output phase. This point is open to debate but it's
definitely the simplest to implement. From my experience its unlikely that
the size of the output-link will grow so big as to slow things down
significantly.

5. No XML. The message formats are concise and easy for people to read.
They're line oriented and whitespace delimited. WM changes are expressed as

(a|d|u) <id> <attr> <val> [<newval>]
...
%

where a = add, d = delete, u = update. The % marks the end of the message.
Output link contents will be sent as lines of triples

<id> <attr> <val>
...
%

Status messages (to update the conventional ^status attributes on commands)
are sent as

<id> <status>
...
%

These are the only types of messages communicated. There won't be any event
or debugger related messages like in SML.

6. Soar will reserve a single letter, for example z, for the environment to
use in naming input link identifiers. This allows the environment to
generate identifiers without having to synchronize with Soar. Therefore,
the single message

a I2 foo Z1
a Z1 bar boz

is sufficient to create the wmes (I2 ^foo Z1) and (Z1 ^bar baz) on the
input link. Furthermore, this scheme simplifies debugging as the user knows
exactly which identifiers on the Soar side he/she created via the
environment, unlike in SML where identifier names are kind of a mystery.

7. No standard API. The messages are easy enough to generate manually. Of
course users can write their own task-specific modules to automate message
generation, using abstractions appropriate for their environments, in
whatever language they want. No maintenance, no need for SWIG.

At the moment I've implemented a prototype of this interface on top of SML.
Eventually the code to handle this stuff will go directly into the Soar
kernel. My SML prototype is about 200 lines of python code, and I imagine
that a final version in C won't be much bigger, maybe around 2000 lines.
I've attached the python script if anyone wants to look it over.

Original issue reported on code.google.com by joseph...@gmail.com on 13 May 2010 at 4:35

Attachments:

GoogleCodeExporter commented 9 years ago
Have you looked at how JSoar does I/O? It's also a slimmed down approach with 
some
additional plusses (e.g., being able to run each agent in its own thread).

This page contains links to additional info:

http://code.google.com/p/jsoar/wiki/JSoarUsersGuide

Original comment by marin...@gmail.com on 13 May 2010 at 3:10

GoogleCodeExporter commented 9 years ago
From a very brief look it appears that JSoar communication only takes place in a
shared memory space, which is not what I'm aiming for here. I'm also not 
particularly
fond of threads in general. But I think it would be easier to implement what I'm
proposing here on top of JSoar than CSoar. In fact it would probably be trivial 
given
what is there already.

Original comment by joseph...@gmail.com on 13 May 2010 at 4:47

GoogleCodeExporter commented 9 years ago
I was going to say about the same thing. JSoar's API is different and more 
powerful
than SML's but not a ton simpler. Implementing your protocol in JSoar would be 
pretty
trivial.  JSoar's "quick" memory stuff is kind of similar, but still in-process:

http://darevay.com/jsoar/0.8.0/docs/jsoar-core/api/org/jsoar/kernel/io/quick/QMe
mory.html
http://darevay.com/jsoar/0.8.0/docs/jsoar-core/api/org/jsoar/kernel/io/quick/Soa
rQMemoryAdapter.html

you basically just read and write path/value pairs and all the callbacks and 
thread
safety stuff is taken care of.

Did you have any thoughts on run control? That is, does the environment have any
control over whether the agent is running? Or is that handled by something else?

Is the protocol blocking, i.e. will the agent wait for output to be handled 
before
proceeding? 

Finally, is the copy of the output link just a list of WMEs, and the client is
responsible for traversing it? If so, then you could also just send output as a 
list
of changes like input. This would be tricky in SML, but this is how the 
output-link
code in the raw kernel and JSoar already work so it would be really easy there.

Are strings always quoted to differentiate from ids? How are newlines escaped?

I think it's a cool idea.

Original comment by dave...@gmail.com on 13 May 2010 at 5:09

GoogleCodeExporter commented 9 years ago
I was thinking that all run control and debugging related functionality should 
be
handled separately from the environment communication protocol. The simplest 
way, and
the way I was thinking of doing it, was just to let the user enter commands 
into the
stdin of the Soar process and have any output go to stdout. This way io 
handling in
the Soar kernel will be very straightforward and we can get rid of all the XML 
calls.
If we want to get fancier with a graphical debugger, we can just write a 
program that
communicates with Soar's stdio and presents the output differently to the user. 
We'd
have to change the output format of Soar a little to be more machine friendly 
but
still clear to humans. For example it shouldn't try to arbitrarily wrap long 
lines
like it does now. If there needs to be coordination between running the agent 
and
running the environment, the user can always spawn a third process that 
controls the
execution of both the Soar and environment processes via stdin of both 
processes.

The protocol as I've implemented it in python is blocking. If we want 
asynchronous
execution of the environment and Soar, we can always write a program that acts 
as a
message buffer that sits between Soar and the environment.

Sending changes for the output link is definitely a possibility. I had thought 
that
this might present a synchronization issue but now I don't think this would be a
problem. The key is to not allow the environment to create arbitrary wmes on the
output link as SML does. Since status attributes are so conventional, I've 
currently
compromised by allowing the environment to send string status values for every
command, but make it solely up to the Soar side what to do with these. In other 
words
the environment shouldn't expect that the statuses will be reflected in the next
state of the output link in any way.

As for string quoting and all that, they're details that need to be worked out. 
I'm
thinking of going with single quotes and backslash escapes that are 
conventional in
unix. So newlines would be represented as \n.

Original comment by joseph...@gmail.com on 13 May 2010 at 5:43

GoogleCodeExporter commented 9 years ago
It seems like your approach is doable, but I'll take a slightly more 
conservative
approach. Add your I/O system to the base kernel, but leave SML (and all the 
XML and
other stuff) in place. Then when you need a debugger or whatever, SML is still 
there
to provide that. I can't say whether SML would be amenable to additional I/O 
stuff
going on behind it's back, but I think this would give the best of both worlds. 
If I
added this to JSoar, I'd leave the existing I/O and run control stuff alone and 
just
hang this socket based code off the agent as another I/O provider.

Does the protocol allow multi-valued attributes? It's not really necessary,
especially if the idea is to keep things simple.

Finally, randomly, since it's a synchronous protocol, I wonder if using HTTP 
might
make sense. In most scripting languages setting up an HTTP server is usually 
one line
of code. If anything, it's a simple protocol that almost everyone understands.

Original comment by dave...@gmail.com on 14 May 2010 at 1:04

GoogleCodeExporter commented 9 years ago
> Does the protocol allow multi-valued attributes?

Speaking of multi-valued attributes, do any of these input schemes allow 
other-than-
string attributes? Apologies for asking instead of looking this up. I think any 
future IO system should support all legal symbol types for attributes.

Original comment by voigtjr@gmail.com on 14 May 2010 at 2:07

GoogleCodeExporter commented 9 years ago
Well I already have the SML implementation that lets me run and debug 
environment
programs in a separate process, and I'm pretty satisfied with that. In fact 
that was
the main reason I made this thing in the first place. I don't see too much 
incentive
in rewriting the protocol to be native to the kernel except as an experiment to 
rip
out SML. I'm going to hold off on that until I'm reasonably sure that the 
protocol
can accommodate most use cases. Also, the protocol will probably evolve, and I'd
rather evolve 200 lines of python than a couple thousand of C.

I was looking at previous Soar workshop talks today and it was interesting to 
see
that SGIO, GSKI, and SML were all introduced in consecutive years. A lot of 
code must
have been typed in the name of Soar connectivity those 3 years. The thing I 
still
don't understand is why all those systems unified communication and control. It 
seems
most natural to me to separate the two issues. Control is naturally more 
coupled to
execution, and a separate concern from communication.

I didn't make any special provisions for multi-valued attributes but I don't 
see why
they wouldn't be supported.

I guess I don't see why HTTP would be necessary. As it stands the messages are 
sent
without any headers, wrappers, or handshaking protocols. Is this naive?

Original comment by joseph...@gmail.com on 14 May 2010 at 2:08

GoogleCodeExporter commented 9 years ago
The python version expects only string attributes and since it uses SML that
implementation is stuck with it. But there's nothing in principle preventing the
communication of identifier or numeric attributes.

Original comment by joseph...@gmail.com on 14 May 2010 at 2:13

GoogleCodeExporter commented 9 years ago
Yeah, when I got this message I thought "it wouldn't be Soar workshop time if 
there
wasn't a new I/O system implementation" :)

HTTP is a random idea. It probably doesn't make any difference.

Integrating directly into the kernel is something I wouldn't want to do either.
Java's not that fun to write either, but at least it has support for silly 
stuff like
sockets. If the Python interface works, stick with it.

I'd be interested to know what kinds of environments this is appropriate for. 
For
example, would you build Eaters or TankSoar with it? Only "batch" runs?

Original comment by dave...@gmail.com on 14 May 2010 at 3:34