hpi-swa / RSqueak

A Squeak/Smalltalk VM written in RPython.
BSD 3-Clause "New" or "Revised" License
83 stars 15 forks source link

Do we want maps? #113

Open timfel opened 8 years ago

timfel commented 8 years ago

@krono and I were thinking about where to replace storage strategies (and maybe some shadows) by using maps. This issue is to track some discussion about what we want, where we need shadows.

where do we want storage strategies

The kinds of attributes here change rarely, mostly new classes are created at runtime to replace old ones. So maybe we could get rid of class shadows and just use maps and transitions (with some special nodes? Like in topaz, were object flags are FlagNodes parallel to AttributeNodes) without generating more code.

The following things are stored, and could be represented with special map nodes:

The following two things can just live as attributes (with nice accessors):

Do we need this? If so, it could also be a special map node

Again, I'm not sure we need the shadow. Lookup could be optimized with special nodes when storing. We would search for the name (stored as symbol, elidable) and then return the index, and then lookup in the compiled methods array. one guard for the object map, one for the class map, then we have the symbol idx, then one pointer dereference to get the compiled method array, then a getarrayitem. Maybe we can leverage the fact that the compiled method array is never manipulated directly - rather, a new array is created and the new and old one are swapped using become.

only for compiled method array, not required if we were to get rid of the method dict shadow

context part shadow / frame

These shadows double as the virtualizable frame objects. When ContextParts are loaded from the image, we could make them be special W_Objects (not generic W_PointersObject) that have a _shadow field. When they are dynamically created, only the shadow is created, and only a heap dump would call w_self(), and there we can decide what we create (instead of W_PointersObject).

j4yk commented 8 years ago

Probably, I am missing a part of the concept, but why do we want maps at all? If I understand them correctly as motivated in the PyPy blog, they help when the slot schema of an object is variable at runtime, but in practice it rarely changes. In Smalltalk, there are no instVarNames in the bytecode, so a mapping from attribute names to storage indices as shown in the blog posts is not necessary. When a class is changed in a notable manner, are not all subInstances of that class become:-ed to fresh instances anyway? So the slot schema of an allocated Smalltalk object does never change. Correct me, if I am mistaken.

If I understood @krono correctly, we want to get rid of the dozens of guards when accessing storage. Assuming we want to retain unboxed storage for Collections which happen to only store SmallIntegers etc. how do maps help here?

timfel commented 8 years ago

My thought is that we have a different implementation type for arrays. the array class is in the special objects table, so we would know early on while loading the image when to create a W_Array (which uses strategies) and when to use W_PointersObject (which will use maps).

j4yk commented 8 years ago

But when the size of an object cannot change and all instance variables are accessed by index, why do you want to use maps for ordinary W_PointersObjects? Which guards can be avoided or what is more efficient with maps than without? Will the SmallIntegerInstVarAccessor (currently misspelled) allow unboxed integers in instance variables?

krono commented 8 years ago

That's the plan. What's changing here is not the number of fields but their runtime types.

timfel commented 8 years ago

@j4yk we will have direct pointer fields and direct integer fields on each object, and fill them according to need. any variables beyond what has space in the directly accessed fields will go into a generic storage array that does not use strategies (or maybe it could, but I don't think it's necessary). The reason for maps is, that the mapping of small-integer and pointer fields on the object to instance variable indices is not fixed, so maps should be able to help us to use the direct fields effectively.

j4yk commented 8 years ago

That makes sense, thank you both. But we will have an additional memory overhead from these fields for small objects with less instance variables, right?

timfel commented 8 years ago

This might still be nice to have, but we've added mutable integers in ordinary pointers objects now, which gets rid a few allocations for a number of cases. Not as much as we'd like, but good enough, maybe.

timfel commented 7 years ago

There is a maps implementation in tim/maps. A map is simply a kind of strategy that is used for fixed size pointers objects. The maps branch also adds inline fields for pointers objects, that then are used in the maps. This is kind of the strategy that @smarr uses in SOM, but it seems BitBlt gets slower with this, because there are many lazy initializers... :(

It makes some benchmarks better, and others worse. The ones that get worse are mostly because of instability in the maps. There are lots of lazy initializers and BitBltSimulator is megamorphic ... so, meh, not sure if that should still be merged and improved :(

smarr commented 7 years ago

What exactly do you mean with instability? Because the lazy initializers run too late and then cause code to be invalidated? I remember @cfbolz talking about improving over a basic object layout strategy for PyPy? Perhaps that would be applicable?

And, did you perhaps look at the Truffle Object Model paper?

There the strategy is more aggressively geared towards accounting for such uses, and, I would assume, PyPy's strategy might also be more lenient in a sense then what I got in SOM. In SOM, I have only one map/shape/layout per class, which means lazy initialization leads to code invalidation.

However, if you take for instance the approach of Truffle's object model, each allocation site has potentially a separate tree of shapes. Of course, that has other issues like higher polymorphism, which might need to be solved separately. And, I am not sure, but I think @cfbolz was working on guards for PyPy that can be more complex than a simple comparison, which might also be helping with avoiding separate compilation for compatible shapes/maps.

timfel commented 7 years ago

@smarr Thanks I hadn't read that paper.

The issue we're having comes down to higher Polymorphism, I think: especially in BitBlt, the values of 5 or 6 different fields in the argument to the primitive lead to completely different initializations of a subset of the 64 instance variables of the BitBltSimulator class - by the time we get to the inner loops that move bits to the screen, we're looking at about 20 different possible shapes here. The allocation site in every instance is the same, they just go down different paths later on. I will take a look at the Truffle paper, thanks for the pointer 👍

smarr commented 7 years ago

Hm, yeah, sounds like that's also a problem with what Truffle does. That would mean you get 20 different shapes, indeed.

Are those 20 shapes truly different, or could your group them? Things like initializing first a and then b, or first b and then a, ideally leads to the same shape.

Perhaps it is time to just fix BitBltSimulator ;)