jgaskins / perpetuity

Persistence gem for Ruby objects using the Data Mapper pattern
http://jgaskins.org/blog/2012/04/20/data-mapper-vs-active-record/
MIT License
250 stars 14 forks source link

Circular references cause infinite recursion #45

Open jgaskins opened 10 years ago

jgaskins commented 10 years ago

This code:

require 'perpetuity/postgres'

Perpetuity.data_source 'postgres://localhost/perpetuity_gem_test'

class Foo
  def initialize
    @bar = Bar.new(self)
  end
end

class Bar
  def initialize foo
    @foo = foo
  end
end

Perpetuity.generate_mapper_for Foo do
  attribute :bar, type: Bar
end

Perpetuity.generate_mapper_for Bar do
  attribute :foo, type: Foo
end

Perpetuity[Foo].insert Foo.new

… causes this error: stack level too deep (SystemStackError)

The problem is that when an object is serialized, referenced objects are persisted to get their ids so we can store those references. That is, when the Foo's bar attribute is serialized, it is inserted into the database to get its id to store in the bar column, which then repeats the process the other way around, causing infinite recursion.

I have no idea how to solve this.

acook commented 10 years ago

I don't know what the serializer code looks like, but I'll assume it looks something like this (I know it'll be more complex):

def serialize object
  serial = Hash.new
  object.attributes.each do |attr|
    serial[attr.name] = if attr.has_mapper? then
      serialize attr
    elsif DBTYPES.include? attr then
      attr
    end
  end
end

From there we can do this:

def serialize object, visited = Hash.new
  return visited[object] if visited.has_key? object # return the previously serialized object
  serial = visited[object] = Hash.new

  object.attributes.each do |attr|
    serial[attr.name] = if visited.has_key? attr then # check that this attr is new
      visited[attr] # dump serialized data
    elsif attr.has_mapper? then
      serialize attr, visited
    elsif DBTYPES.include? attr then
      attr
    end
  end
end

This is a very simplified example, but hopefully the idea makes sense. Also instead of passing visited recursively it could instead be part of a state object thats initialized when the serializer is run.

The fun part is that since we're reusing the same Hash object, we can continue to make changes to the serialized hash so we can continue to add attributes to it in a nested fashion. If we need an additional layer of protection against getting that hash confused we just add a Set with object membership to make the lookups faster.

jgaskins commented 10 years ago

We'll definitely have to use some sort of data structure to track which objects we've already begun serializing. A Set uses a hash under the hood, so adding one for membership checking might not be an improvement. I'm not saying it's a bad suggestion; I may not have all the answers because I haven't tried it yet. :-)

Since hashes use Object#hash and Object#eql? under the hood, we'll need to use an object identity checker that wraps the object as the key and has hash forwarded to the object and eql? defined as the object's identity being equal to the other. This way, we can be sure we're storing all unique object identities rather than piling objects together whose values are equal.

jgaskins commented 10 years ago

I've been thinking about this a lot over the past few days and an idea hit me tonight on my way home. Maybe storing circular references isn't a good idea. If they're meant to be circular references, storing the reference both ways is unnecessarily redundant. We can take care of this when the association is loaded:

Perpetuity.generate_mapper_for Foo do
  attribute :bar, type: Bar

  def load_association! objects, attribute
    super

    if attribute == :bar
      Array(objects).each { |object| object.bar.foo = object }
    end
  end
end

This way, when you call mapper.load_association! foo, :bar, each Foo is linked to its Bar. I might bake this into the mapper in a method like def load_circular_reference! objects, attribute, foreign_attribute — except with a better name. I'm really beginning to hate the load_association! method name, too. If these methods are going to be used frequently in an app (one I'm working on now has a model tree 8 layers deep all loaded in one controller action), having to type that out every time would be tedious.

acook commented 10 years ago

Well, ideally you'd have an identity map of some sort which would short circuit this problem, but I assumed you had a reason for avoiding that.

jgaskins commented 10 years ago

Sure, the identity map is there. But if I'm freshly pulling the Foo objects from the database, the identity map won't have the Bar objects until it pulls them out.

acook commented 10 years ago

That makes sense, but then this seems like it needs to be an extension to the identity map, otherwise it'd be duplicating a lot of functionality/code don't you think?

jgaskins commented 10 years ago

It's not the identity map's job to link the two objects together, though. Is that what you mean?

acook commented 10 years ago

Not to link two objects, no. Hmm. And it shouldn't be the identity map's job to track the object traversal either.

When extracting information from the database, it should keep track of what records have been loaded so they are only loaded once as needed...

So to revise my assertion: there must need to be another separate (but related?) mechanism along with it to serve an individual "request's" transversal and detect circular references.

jgaskins commented 10 years ago

The identity map tracks what's been loaded and will return the same object if you pull it out twice. If you are finding by id, it won't even touch the database to get the object. If you're finding by other criteria, it'll check with the identity map before returning the object and return the one in there. That won't save a DB query, but it does try to maintain only one copy of an object in memory.

This way, if you call load_association! and that particular Bar object has already been loaded into that mapper's identity map, it won't hit the database again for it. The identity map is created by the mapper and passed to other mappers that it instantiates, for example during load_association! or nested serialization, so they can share the same identity map.

Reconstructing objects structures from the database is the mapper's job, whether it does the work itself or delegates to another object.

acook commented 10 years ago

I did not mean to imply that the identity map did the extraction or conversion, only that it keeps track of what had been loaded.

jgaskins commented 10 years ago

Okay, just wanted to make sure we were on the same page. :-)

acook commented 10 years ago

I believe so. :) I'm hardly an expert on DBs though, so explanations may be warranted if just to ensure our terminology has the same definitions.