svenvc / ston

STON - Smalltalk Object Notation - A lightweight text-based, human-readable data interchange format for class-based object-oriented languages like Smalltalk.
MIT License
135 stars 32 forks source link

STON is not cycle-proof #3

Closed dalehenrich closed 8 years ago

dalehenrich commented 11 years ago

I have run into some object graphs that end up in infinite loops during deserialization.

I haven't completely characterized the actual graph (it's pretty complex), but it appears that there are several points in the graph that refer back to the one of the root objects ... at the end of the initial pass there are one or more unresolvedReferences and because the graph is self referential the traversal looking to replace the unresolved references never terminates ...

As I think about there are probably two separate self-referential paths ... one reference is resolved and then the second path is followed which leads to the first self unresolved reference (now resolved) and we're off to the races ...

dalehenrich commented 11 years ago

I added a visited IdentitySet for use when resolving unresolvedReferences:

processSubObjectsOf: object
    (self visited includes: object)
        ifTrue: [ ^ self ].
    self visited add: object.
    super processSubObjectsOf: object

That seems to have done the trick for me ...

dalehenrich commented 11 years ago

Upon further review ... the proposed fix is not enough ... the actual fix is just a bit more complex, but in the process of fixing the problem, I've run into an apparent VM bug where instVarAt:put: is causing the vm to go into an infinite loop (the vm is fairly recent ... from July) of course this is the case where the STONResolved instance was being replaced by the actual object so at this point I can't say whether or not the tracking set is required or not, since the infinite loop I saw was a result of this vm bug ...

Ah well ... at this point I won't try to track these two bugs anymore ... I'll have to take a completely different tack...

noha commented 11 years ago

I started using something like visited but that didn't suceed for me. I took the opposite route then. I broke down the recursion by introducing an unprocess objects stack.

next
    | object |
    self consumeWhitespace.
    object := self parseValue.
    unresolvedReferences > 0
        ifTrue: [ 
            unprocessedStack add: object.
            [ unprocessedStack isEmpty ] whileFalse: [ 
                    self processSubObjectsOf: unprocessedStack pop]].
    ^ object
processSubObjectsOf: object
    object stonProcessSubObjects: [ :each |
        each isStonReference 
            ifTrue: [ 
                self resolveReference: each ]
            ifFalse: [
                each canContainReference ifTrue: [
                    unprocessedStack push: each].
                each ] ]

In order to minimize the unprocessedStack I introduced "canContainReference" which is false for String, Magnitude, ... Sven says this is not needed and need to review my code again. At least that did the trick. I reduce the stack depth from n to 1 and the introduced collection is limited in size. This way I can read in the stream with less overhead.

mabinogi commented 11 years ago

Is there a complete solution to this problem? I've tried to trace it through to figure out what was happening, but after a couple of evenings looking at it, all I've done is given myself a headache.

svenvc commented 11 years ago

As far as I know, or designed it, STON is/should be cycle proof. There are some unit tests like #testReferenceCycle. I intend to add some more tests in this area.

I am still waiting for a clear failure case.

mabinogi commented 11 years ago

This file is the one that loops for me. It seems to get stuck in the process of materializing @6 - the second ATLocation.

ATWorld {
    #name : '1. World',
    #locations : OrderedCollection [
        ATLocation {
            #description : 'This is the first room.\r\n\r\nYou start here.',
            #exits : OrderedCollection [
                ATExit {
                    #destination : ATLocation {
                        #description : nil,
                        #exits : OrderedCollection [
                            ATExit {
                                #destination : @3,
                                #direction : ATSouth {

                                },
                                #origin : @6
                            }
                        ],
                        #name : 'Second Room',
                        #id : 2
                    },
                    #direction : ATNorth {

                    },
                    #origin : @3
                },
                ATExit {
                    #destination : ATLocation {
                        #description : '',
                        #exits : OrderedCollection [
                            ATExit {
                                #destination : @3,
                                #direction : ATWest {

                                },
                                #origin : @12
                            },
                            ATExit {
                                #destination : ATLocation {
                                    #description : 'This is yet another room\r\n\r\nYou can have multiple paragraphs\r\n\r\nEach new line is a new paragraph',
                                    #exits : OrderedCollection [
                                        ATExit {
                                            #destination : @12,
                                            #direction : @10,
                                            #origin : @17
                                        }
                                    ],
                                    #name : 'And another',
                                    #id : 4
                                },
                                #direction : @9,
                                #origin : @12
                            }
                        ],
                        #name : 'Another Room',
                        #id : 3
                    },
                    #direction : ATEast {

                    },
                    #origin : @3
                }
            ],
            #name : 'First Room',
            #id : 1
        },
        @6,
        @12,
        @17,
        ATLocation {
            #description : nil,
            #exits : OrderedCollection [ ],
            #name : 'stuff',
            #id : 5
        }
    ],
    #description : 'This is the world that will actually be played\r\n',
    #id : 5
}

I can send you the mcz for the classes themselves if you need them.

svenvc commented 11 years ago

Yeah, it would be handy to have (part of) the Smalltalk model so that I can generate the file myself, if that isn't too much to ask. I need to be able to run this in both directions and understand the cycles as they exist in memory.

mabinogi commented 11 years ago

Ok - the model classes are here: http://stuartherring.net/mc/Caitlin/AdventureThing-Core-StuartHerring.4.mcz

svenvc commented 11 years ago

OK, I got the code in my image; how do I create the test/example world in Smalltalk then ?

I can't find it in the tests...

It is a cool and good example, thanks for sharing !

mabinogi commented 11 years ago

The original was created in a Seaside app. This doit should create the same structure, and I've verified it suffers the same problem when serialzed and deserialzed.

world := ATWorld new
    name:  'cycle test';
    addLocation: (ATLocation new name: 'room 1' );
    addLocation: (ATLocation new name: 'room 2' );
    addLocation: (ATLocation new name: 'room 3' );
    addLocation: (ATLocation new name: 'room 4' );
    addLocation: (ATLocation new name: 'room 5' );
    yourself.

(world locations at: 1)
    addExitWithReciprocal: (ATExit from: (world locations at: 1) to: (world locations at: 2) direction: ATDirection south);
    addExitWithReciprocal: (ATExit from: (world locations at: 1) to: (world locations at: 3) direction: ATDirection east).
(world locations at: 3)
    addExitWithReciprocal: (ATExit from: (world locations at: 3) to: (world locations at: 4) direction: ATDirection north).
svenvc commented 11 years ago

Thanks a lot, this gives me something to work with !

Indeed, deserializing goes into an infinite loop.

I am going to try to simplify your example, maybe with two rooms and 1 bidirectional exit between them, and then trace/step the code.

svenvc commented 11 years ago

This is a smaller example that fails:

| world |

world := ATWorld new name: 'cycle test'; addLocation: (ATLocation new name: 'room 1' ); addLocation: (ATLocation new name: 'room 2' ); yourself.

(world locations at: 1) addExitWithReciprocal: (ATExit from: (world locations at: 1) to: (world locations at: 2) direction: ATDirection south).

world

svenvc commented 11 years ago

Hi Stuart,

I think I fixed the problem that you reported. I also added a new test based on your code, STONTests>>#testRoomExitCycles.

You can take the latest code from github or from squeaksource3.

Can you please confirm ?

Thanks again for your excellent test case.

Sven

PS: Next I am going to try to integrate Norbert's ideas and after that do some performance testing/tuning.

svenvc commented 11 years ago

Hi Norbert,

I implemented your suggestions:

I even added a #optimizeForLargeStructures to use Fuel's large datastructures, especially for you ;-)

Tonight I will try to write some stress tests on large datastuctures.

Sven

mabinogi commented 11 years ago

Hi Sven,

Yes that definitely fixed it for me. Thanks for your efforts!

svenvc commented 11 years ago

Norbert,

I did some performance tuning and I managed to make a significant difference, IMHO. Please check the latest commit.

Sven

svenvc commented 11 years ago

Dale,

Regarding the issue with Text, is this important enough for you to warrant its own serializar/materializer in the main repository ?

The following two queries show some other classes that are potential problems:

Object withAllSubclasses select: [ :each |
  each isVariable and: [ each instVarNames isEmpty not ] ].

Collection withAllSubclasses select: [ :each |
  each instVarNames isEmpty not ].

The general mechanism implemented by STON is

Many more sophisticated scheme exist (Glorp, Magritte, ..) but I did not want to go there with STON. I think the current approach is simple and understandable enough that individual developers can easily tune the system for their own models by write a small amount of code.

What do you think ?

Sven

dalehenrich commented 11 years ago

Sven,

Honestly, I have mixed feelings ...

I understand that STON is an either/or proposition when it comes to named/indexed variables, but I can't help asking myself the question:

But what do I do when I want to serialize an instance with both named and indexed fields?

It would be nice if there were a way to do this, but not required. Ugliness for this special case would not be a total crime:)

Presumably if push came to shove, one could devise a scheme using the fields 'named' and 'indexed' to wrap each set then a custom reader could decode things correctly.

I would think that an example writer/reader combo would be enough for use when the odd case arises ...

Dale

----- Original Message ----- From: "Sven Van Caekenberghe" notifications@github.com To: "svenvc/ston" ston@noreply.github.com Cc: "Dale Henrichs" dhenrich@vmware.com Sent: Friday, November 30, 2012 2:59:40 AM Subject: Re: [ston] STON is not cycle-proof (#3)
Dale,
Regarding the issue with Text, is this important enough for you to
warrant its own serializar/materializer in the main repository ?
The following two queries show some other classes that are potential
problems:
Object withAllSubclasses select: [ :each
each isVariable and: [ each instVarNames isEmpty not ] ].
Collection withAllSubclasses select: [ :each
each instVarNames isEmpty not ].
The general mechanism implemented by STON is
* write ALL instance variables as a map or
* write ALL collection elements as a list
Many more sophisticated scheme exist (Glorp, Magritte, ..) but I did
not want to go there with STON. I think the current approach is
simple and understandable enough that individual developers can
easily tune the system for their own models by write a small amount
of code.
What do you think ?
Sven
Reply to this email directly or view it on GitHub
svenvc commented 8 years ago

Cycles are fixed, AFAIK Text has a custom serializer. Closing for now.