Performance with assigning to a MapModel

gotjoshua commented 6 years ago

I don't think this is bug (rather a symptom of how much work OM needs to do internally), but i want to report a real-world performance use case:

We have JSON objects coming from a REST api. One of our larger queries is about 950 Objects.

We wanted to use:

_OMap = MapModel(CiviId,ParticipationOM)

where

const CiviId = PositiveInteger.extend().as('CiviId');
class ParticipationOM extends ObjectModel({
  participant_id: [CiviId],
  id: [CiviId],

  contact_id: [CiviId],
  event_id: [CiviId],
// .... 30 more props (Primitives and BasicModels)

When new data comes in, we don't want to assign a new object into the map if the key already exists, so when getting a newOMap from the REST call we do this loop:

newOMmap.forEach((om, ID) => {
  let echSt = performance.now();

  this._OMap.has(ID)
    ? this._OMap.get(ID).update(om) // fyi: OM.update = (newOM) => Object.assign(this, newOM);
    : this._OMap.set(ID, om);

  perfTracker.push(performance.now() - echSt);
});

if _OMap is a MapModel it takes 30-45 seconds (avg 30-45ms per set op). if _OMap is a Map it takes ~13ms (avg:0.01369ms per set op). that is 2000-3500 times faster with a normal Map().

Interestingly, the average seems much lower at the beginning and for other smaller maps being added.

Is this to be expected? Is there anyway we can improve the performance of MapModels (as users)? Is there internal optimization to be expected in v4?

Some basic js questions: Does map.has() take longer and longer as the map grows in size? Does map.set() take longer and longer as the map grows in size?

For now it is quite clear we will use Map: _OMap = new Map(); // not MapModel(CiviId,ParticipationOM) The dynamic checking at exactly this point of the app is not so critical (as the objects are already type-coerced and cast before they are passed along to the central OMap cache).

sylvainpolletvillard commented 6 years ago

It's been a long time since I looked at performance bottlenecks. I'm sure there is room for optimization, we just needed a good usecase to work with.

At first sight, my guess is that Object.assign assign every new property one by one, so the whole model is validated with its assertions about 30 times for each operation.

Let's make a benchmark test to confirm that and find the bottleneck

gotjoshua commented 6 years ago

I don't know how Object.assign works internally...

Perhaps this issue could be an answer for: "What would be the difference with Object.assign"

However, I doubt it. Because it seems that the update assignments are much faster than the new set operations...

When you say:

Let's make a benchmark test How shall we proceed?

sylvainpolletvillard commented 6 years ago

Could you isolate the concerned code and models with some fake data somewhere in a JSBin or something ? I can try to make a benchmark test myself but maybe the bottleneck hides in a specific part of your code, so I would rather take your real use case.

gotjoshua commented 6 years ago

mapmodelset overtime

Over time, as the size of the map is increasing the trend is clearly increasing. Also about 2/3 of the way from left to right starting around 700 is the 30 or so that the Map.has() already and are using update - they are extremely quick.

Weird thing is, that the objects are already legit instanceof ParticipantOM... Could there be some sort of short circuit test to avoid full model validation?

I can try to isolate it into a codepen project but maybe not until later in the week... (unless i can't resist)

Thanks again...

sylvainpolletvillard commented 6 years ago

Quick bench on my side:

const Id = PositiveInteger.extend().as('Id');

class OM extends ObjectModel({
        id0: [Id],
        id1: [Id],
        id2: [Id],
        id3: [Id],
        id4: [Id],
        id5: [Id],
        id6: [Id],
        id7: [Id],
        id8: [Id],
        id9: [Id]
    }) { };

 const OMap = MapModel(Id, OM)

 for (let n of [10, 100, 1000, 10000]) {
        const data = mockOMapData(n);
        bench(() => new OMap(data), `Init MapModel with ${n} elements`)
 }

Results:

Init MapModel with 10 elements: 12.800ms Init MapModel with 100 elements: 32.900ms Init MapModel with 1000 elements: 115.500ms Init MapModel with 10000 elements: 944.800ms

After analysis, it appears 95% of CPU time is used for the autocast feature, that is, check if every map value provided is suitable for the OM model. If values provided are already instances of OM, then the numbers drop to:

Init MapModel with 10 elements: 3.600ms Init MapModel with 100 elements: 11.800ms Init MapModel with 1000 elements: 10.300ms Init MapModel with 10000 elements: 47.900ms

gotjoshua commented 6 years ago

ok, but your curve goes in the opposite direction (avg per assignment goes down with increased map size) can you also bench your mockOMapData call?

can you try a similar thing with setting entries one by one? This is what we are doing - not a map init with all values up front (new Map([[id:x,om:y]...]) but rather new Map() and then assigning single entries with in a loop...

sylvainpolletvillard commented 6 years ago

When including the mockOMapData call in the bench, it logically takes a bit longer but numbers are still much lower.

no autocast, including mockData call: Init MapModel with 10 elements: 3.200ms Init MapModel with 100 elements: 12.200ms Init MapModel with 1000 elements: 32.200ms Init MapModel with 10000 elements: 174.100ms

I'm working on optimizing the autocast feature to reduce the number of calls needed. Already got ~75% perf improvement but I lost the ambiguous model cast detection. This will take some time to get it right...

gotjoshua commented 6 years ago

a sample of our more detailed performance traces:

into MapModel(PositiveInteger,ParticipationOM)

[31781.7] 26530 Updt OMap with OMinst ? true took: 0.5ms
[31782.5] 26565 Updt OMap with OMinst ? true took: 0.5ms
[31783.3] 26653 Updt OMap with OMinst ? true took: 0.4ms
[31838.2] 24128 OMap  set with OMinst ? true took: 54.3ms
[31886.0] 24304 OMap  set with OMinst ? true took: 47.3ms
[31933.1] 24371 OMap  set with OMinst ? true took: 46.8ms

into Map()

[11097.2] 26530 Updt OMap with OMinst ? true took: 0.5ms
[11098.7] 26565 Updt OMap with OMinst ? true took: 0.4ms
[11100.7] 26653 Updt OMap with OMinst ? true took: 1ms
[11102.0] 24128 OMap  set with OMinst ? true took: 0ms
[11102.7] 24304 OMap  set with OMinst ? true took: 0ms
[11103.3] 24371 OMap  set with OMinst ? true took: 0ms

verdict update (via Object.assign) very fast (but not quite as fast as set into Map()) set into MapModel very slow (especially for bigger Maps getting slower and slower)

sylvainpolletvillard commented 6 years ago

Bench details:

For maps of 1000 elements, for each map instanciation:

1000 object models are tested
11000 basic models are tested (1 for the key and 10 for each id property of OM object model)
12000 autocasts in total
ID BasicModel has 2 assertions: positive and integer, That makes 22000 assertions tests

Around 100ms for 12000 autocasts and 22000 assertions tests seems legit compared to the last performance check I did at the time when v3 was released.

So the good news is that no perf killer has been introduced in a new version. The bad news is that I don't have a magic solution to make this 10 times faster. I have a few ideas, but I don't expect over a 50% improvement

sylvainpolletvillard commented 6 years ago

Encouraging results after optimizing autocast to no longer try to cast basic models and deep nested structures:

Init MapModel with 100 elements: 16.300 ms Init MapModel with 1000 elements: 46.233 ms Init MapModel with 10000 elements: 373.767 ms

Tests pass, but I'm going to double check if I did not break anything.

sylvainpolletvillard commented 6 years ago

I worked hard today and found new optimizations. It breaks some edge cases of autocasting with basic models, but I don't think anyone will be strongly impacted. Doing a new major version just in case

Init MapModel with 100 elements: 16.100 ms Init MapModel with 1000 elements: 28.767 ms Init MapModel with 10000 elements: 239.967 ms

😍

sylvainpolletvillard commented 6 years ago

v3.7.0 released, please rerun your tests and post your results

gotjoshua commented 6 years ago

Looks much better! Still has a basically linear perfomance hit as Map.size increases... but seems 2x-6x faster at first glance (blue is old data with 3.6.x - red is with 3.7.0): mapmodel-assign-muchbetter ( x axis is Map.size - y axis is ms per set() )

Nice work!

sylvainpolletvillard commented 6 years ago

the linear curve is normal, since there are more items to type-check. I counted every call to checkDefinition and cast methods to ensure we only called them the minimum number of times required. So I won't be able to optimize further, except for minor micro-optimizations in the code. If the current performance level is not satisfying enough, I would suggest to validate only a small part of the data. Also, remember that you can replace ObjectModel API calls by noop functions on production to remove the penalty performance

gotjoshua commented 6 years ago

For me it is totally satisfying (we anyway decided to cast the objects before hand and to use a normal Map in this performance critical spot), but I am not sure why there are more items to type check as the size of the map goes up. My loop is setting a single item at a time.... ( x axis is Map.size - y axis is ms per set() ) Do you check each and every object in the MapModel on a single addition??

sylvainpolletvillard commented 6 years ago

Since autocast has been highly optimized, you shouldn't need to manually cast the objects anymore. I checked my previous bench and the numbers are now almost the same with or without autocast.

What do you mean by "a single item at a time" ? Could you share your bench code ? I think there is a misunderstanding. What I try to say is that instanciating a map with 2000 elements should take approximately twice the time to instanciate a map with 1000 elements.

Here is my benchmark: http://objectmodel.js.org/test/bench/map.html

sylvainpolletvillard commented 6 years ago

Ah I think I understood, instead of passing 1000 elements to the MapModel constructor, you are using map.set method 1000 times in a for loop. This is indeed terrible for performance with ObjectModel. Let me explain:

Everytime a mutator method is called on a model instance, like map.set or array.splice, we need to check if the mutated object is still valid. If the mutation is invalid for the model, we need to prevent this mutation to happen and preserve the object in its original state. But at the same time, the simple (only?) way to find out if a mutation respects the model definition is to apply this mutation.

So the library works by using a testing clone: it creates a clone of the object and apply the mutation on the clone, then validate it after against the model definition. If the mutated clone satisfies the model definition, then the operation is considered valid and is applied on the original object.

Everytime you do a map.set, the map is cloned and validated entirely. It is indeed very suboptimal, but consider that your model can have some custom assertions, for exemple a max size. Some mutator methods are also more complex than a map.set and completely change the collection, for example array.reverse(). The only way to optimize this would be to specify for every existing mutator method which parts of the collection should be checked again. It may be possible, but is much more complex than current implementation. Also this may be unpredictable sometimes, I currently think of array.sort(). I will continue to think about it, but it's a whole new problem.

gotjoshua commented 6 years ago

Yes, thats it. Instantiation is earlier and we are using "map.set method 1000 times in a for loop" Thanks for the detailed explanation and for the detailed thinking that you have and are putting into it!

This is understandable and fully explains the performance numbers that I am getting. As I said earlier, we are quite convinced to simply use a normal map and cast our objects on the way in (we want to do that anyway because they come from the data source and the json needs to be type checked/coerced and cast as valid OMs).

However, I am also very interested in the basic functionality of ObjectModel, so I'm happy to continue this exploration.

Although I now more or less understand the whole clone/mutate/check cycle that you describe above, I am surprised. What I expected (before reading your explanation) was that using the Map.set() method on a MapModel would simply check the two incoming values against the definition and reject them if they don't pass. If there are no additional full model assertions, then this seems sufficient.

sylvainpolletvillard commented 6 years ago

What I expected (before reading your explanation) was that using the Map.set() method on a MapModel would simply check the two incoming values against the definition and reject them if they don't pass.

This would have been the obvious behaviour if I had to implement a custom logic for each and every mutator method of arrays, maps and sets. Which the library current doesn't do, to keep things simple and code small.

I will try to improve this in the next days. My current reasoning is that for each mutator method, I need to specify:

which method arguments to autocast and how
which elements to check definition after mutation
if checking assertions is required (probably always yes, as we can make assertions about anything)

sylvainpolletvillard commented 6 years ago

Let's open a new issue for this problem. I consider the original perf issue on MapModel instanciation closed.

https://github.com/sylvainpolletvillard/ObjectModel/issues/95

gotjoshua commented 6 years ago

Whoa! Rock on!

Well, All wrong sounds a bit extreme to me... but definitely looking forward to see what you come up with!

sylvainpolletvillard commented 6 years ago

Nevermind, I'm an idiot... my map of 10000 elements was actually generating only 100 different keys

time to get some sleep

sylvainpolletvillard / ObjectModel

Performance with assigning to a MapModel #94