Performance: Analysis and Proposals

gotjoshua commented 4 years ago

We've been using ObjectModel library as a core element in our app for the last 2 years and we are thrilled with the features. It has helped immensely to find bugs during development phase. I have also been very impressed with @sylvainpolletvillard 's responsiveness and clarity (Thanks!).

Soon we will launch and we discovered that it really doesn't make sense to go live with ObjectModel at the core of all of our objects (at least not in its current state of performance).

We have a ton of "convenience getters" that often call each other and return easy to use arrays objects and values. The performance penalties of ObjectModel seem to amplify with each "nested" getter.

I am opening this issue in hopes of starting a dialog including analysis of the current state and proposals for improvement.

My initial measurements are visible in this comment and in the codepen.

My first thoughts and questions:

Why do "extended" props (those not included in the definition of an unsealed model) need to be validated or even checked?
Could it make sense to offer a "production mode" for OM that could do a run-time type check on instantiation and then return a basic object instead of a proxy? 2b. or even a mode that could allow for full type checking in development and maximum performance in production mode.
What are the most "expensive" parts of the OM architecture?

sylvainpolletvillard commented 4 years ago

Hello @gotjoshua

I'm glad this project helped your team. In return, you have been a regular contributor to this repo by reporting many issues and providing very useful feedback, and I thank you for that.

Performance is definitely the number one issue and improvement path for ObjectModel at the moment.

From v2, to v3, we switched from getters/setters to Proxies. The immediate good was that we no longer had to scan all the properties of instances to initialize the observer pattern. But what we gained at model instanciation time gradually gets lost at every property read/write through a Proxy, because it appears that Proxies are orders of magnitude slower than regular POJO in current browsers implementations. Note that they are still a relatively recent and underused feature in JS engines, so there is also room for improvement on JS engine developers side. In particular, I think the release of Vue.js v3, which uses Proxies at its core, will bring the attention of V8 and SpiderMonkey's teams to Proxy performance in their respective engines.

Of course, browsers are not the only culprits and there is a lot of potential optimizations on ObjectModel codebase too, as you saw yesterday. While I am satisfied with the current state of testing for this lib, perf benchmarking on the other hand has been neglected, mainly due to my lack of experience with micro-optimizations. I rely on handcrafted benchmarks, pretty similar to what you do on Codepen, but these are not useful to spot optimizations at a micro level ; at best they can be used to show evidence of a perf issue. I definitely have to change my method and tools to correctly address perf issues in ObjectModel in the future.

Now let me try to answer your questions:

Usage of ObjectModel on a large project, and in comparison with static type checkers

it really doesn't make sense to go live with ObjectModel at the core of all of our objects

Definitely not, and I'm sorry if you got the wrong picture here. I tried to explain this in the FAQ, but the answer may be too clumsy:

it is not advisable to use object models in performance-critical parts of your applications

My current approach with using ObjectModel in a large project is to use it as a complement to static type checking, typically TypeScript. Both static and dynamic typing are useful, as they address different issues. The most appealing part of TypeScript is not preventing bugs, but improving the developer experience with static analysis and type inference inside the IDE, validating your code without even needing to run it. ObjectModels are useful where TypeScript is powerless: validating dynamic, unpredictable data. These spots are usually what I call the "app boundaries", or the interfaces between your app logic and the external world. I made a presentation a few months ago around this idea, you might be interested to look at it: https://slides.com/sylvainpv/strong-typing-in-javascript

Working with external interfaces (DOM, network, user inputs, storage...) often involves time ranges that are long enough for the ObjectModel cost to be negligible, and the frequency of these operations is also much lower ; for example we don't care about a 3ms JSON response validation after a 300ms network call done once. This is probably why I have not been personally impacted by perf issues with ObjectModel these last years ; not that I refute their existence or the need to fix them.

About a production mode that does no validation

This is something I had in mind since the beginning, and actually already done in some projects. ObjectModel API has been designed to be as transparent as possible, meaning that you could eventually replace Model with a x => x identity function in production, and your code should work the same, just without validation.

BUT this way of consuming ObjectModel as a pure type checker implies restricting yourself to a smaller part of its features. As I claim on the website, OM is more than just type checking, and the hidden power of dynamic validation appears when you start to rely on it to change the app logic. For example:

custom error collectors: Order.errorCollector = (err) => logRuntimeError(err) && cancelOrder()
autocasting
null safe traversal
default values
assertions used as watchers/mutations (not recommended, but already seen)

So the only way to provide a production mode would be to even implement all these features, which goes against the initial idea of this production mode, or provide a "strict" alternative version of ObjectModel that acts as a pure type checker that does not change data at all ; which is something I have considered, but am not happy with.

Indeed, I try for years to sell the idea of dynamic type-checking as something that is more than type-checking (hence the catchword) and is actually a part of the app logic that devs could rely on. So this would go in the opposite direction where I want to lead this library, I think. I'd rather focus on helping people get the most out of this library by using it only where it makes sense, and not remove it in production because it would still be considered "useful code", and part of the app codebase.

Expensive parts of OM architecture

Listing all the operations done by OM and their cost is a hard job and I will have to clarify it to be able to seriously work on improving performance. What I can answer right now is a global picture I have in mind, based on my experience developing and working with OM:

autocasting implies testing if some data needs to be casted in the first place, and had to be done before the validation step ; it currently has no cache mechanism, meaning that type-checking has to be done twice in many cases. We should probably share intermediate results between the casting and validation phases, or add a cache layer to bypass casting on already casted values.
cyclic detection, used to avoid infinite loops when validing circular references, comes at a perf cost ; we have to do additional checks at every object property access, but also any other operation since we can have circular references even inside getters or user-land Proxy objects. I don't have any ideas right now on how we can improve the current mechanism.
proxifying mutators functions on collections (arrays, maps, sets); at every mutator call on a collection, let's say an array.splice, we clone the collection, apply the mutation on the clone, validate the clone, then only if the testing subject has passed the tests, reapply the mutation on the original collection. So every mutator has to be called twice, to be able to rollback the mutation if there is something wrong with this operation regarding model validity. Again, this is costly and I'm not sure how to improve this part.

Why do "extended" props (those not included in the definition of an unsealed model) need to be validated or even checked?

that should not be the case as we iterate on definition keys and not instance keys. if you have a code example I would be curious to take a look.

Sorry for the long post, I really wanted to address most of your questions here but there is a lot to talk about :)

gotjoshua commented 4 years ago

Thanks for the long post!! It is very insightful.

that should not be the case as we iterate on definition keys and not instance keys. if you have a code example I would be curious to take a look.

Well actually the codepen from #120 is an example. The getters on the VM (num and extendedGetterRequiringNoValidation) are both not included in the VM definition.

I would think the proxy would not need to do much about these getters, in psudo code:

let key = 'propInQuestion';
if (!this.definition.hasOwnProperty(key))  return obj[key];

When I discovered #120, I was surprised that OM was "bothering" to do anything at all with a getter that is not part of the definition. Is this understandable?

I will review the other sections of your reply and answer bit by bit... Thanks again.

sylvainpolletvillard commented 4 years ago

I would think the proxy would not need to do much about these getters, in psudo code

Ah, you mean skipping the get trap entirely for props that are not part of model definition. These undeclared properties are actually not "validated", but OM needs to keep track of this get access for at least two reasons:

grant private access for methods: object methods that are not part of the object model definition should still be able to read private props, but OM needs to register that they come from a method to grant private access
custom assertions: mutating a prop out of definition should trigger the assertions check, in case these assertions rely on these undeclared properties ; example: an array model with a maximum length assertion, or an object model with an assertion on the number of properties it can contain

But you raise a good point, there is probably some parts of the get trap that can be skipped when prop is out of definition. We could save a few milliseconds here.

gotjoshua commented 4 years ago

I try for years to sell the idea of dynamic type-checking as something that is more than type-checking

I understand, and fully agree that dynamic type-checking is much more... and we'd love to go live with some of our ObjectModels in full force. However, we'd also like an easy way to disable OM for some objects when we are in production (and to have that auto detected so we don't have to be manually throwing any switches). With Meteor this is quite easy with Meteor.isProduction, and it would be great to have a simple way in our BaseOM to test that flag and respond accordingly.

I have three modes in mind:

full dynamic type-checking OM features
instantiation checking only with no object proxies
full performance mode - no checking, no proxy, no delays

...

replace Model with a x => x identity function

Could you offer an example for this? maybe even a codepen? Do you have thoughts about integrating this into the actual code somehow?

sylvainpolletvillard commented 4 years ago

we'd also like an easy way to disable OM for some objects when we are in production (and to have that auto detected so we don't have to be manually throwing any switches)

I have several questions:

how would you "auto detect" production ? I don't want to enforce using continuous integration or a specific bundler like Webpack
how do you make sure that OM is used as a pure type checker, so that this "full performance mode" would not actually break any code ? I don't see any simple way to check that autocasting is never used in a codebase for example.
you said disable OM for some objects ; on what basis would you decide which objects would have OM disabled and which won't ?

Could you offer an example for this? maybe even a codepen?

I quickly wrote this to demonstrate: https://codesandbox.io/s/vigorous-bash-rf1xt Remember that it will only work if you use ObjectModel as a pure type checker: no autocasting, no default values, no mutative assertions, no null safe traversal...

Do you have thoughts about integrating this into the actual code somehow?

As said before, the only way to provide a production mode would be to provide a "strict" alternative version of ObjectModel that acts as a pure type checker. And if I had to constraint myself to this strict version of the library, I would rather go with TypeScript honestly.

gotjoshua commented 4 years ago

for "atuo-detect" we would use Meteor.isProduction

how do you make sure that OM is used as a pure type checker, so that this "full performance mode" would not actually break any code ? I don't see any simple way to check that autocasting is never used in a codebase for example. you said disable OM for some objects ; on what basis would you decide which objects would have OM disabled and which won't ?

I think this is the responsibility of the consumers of the library.

We would do it by extending different BaseOM classes.

I'd love to be able to decide at runtime what mode a certain base class is in:

class BaseOM extends ObjectModel({
  createdDate: [Date],
  lastUpdatedDate: [Date],
}).performanceMode(Meteor.isProduction ? FULL_PERFORMANCE : NORMAL) {
  constructor ...
  getters ...
}

and for objects that we trust less coming from an external data source:

class UntrustedOM extends ObjectModel({
  createdDate: [Date],
  lastUpdatedDate: [Date],
}).performanceMode(Meteor.isProduction ? CHECK_ONCE : NORMAL) {
  constructor ...
  getters ...
}

I would still want BaseOM.extend to work, but to behave differently (return a bare object, a check-once object or a full OM proxy) depending on its performance mode.

For other ObjectModel users, who don't use meteor or webpack, they can either ignore the performance option (and pay the penalties) or establish their own tests. IMO its not your problem, if OM offers an easy API that is opt-in, they library users can find their own way with it.

Am I making any sort of sense here?

sylvainpolletvillard commented 4 years ago

I'm still very unconfortable with this... To sum up, what you are asking for is that ObjectModel provides an API that disables basically all the library for performance purpose, then engage users responsability to make sure that their code will still working properly while keeping this useless library code. It's going to be very hard to sell, honestly.

You talked about 3 different operating modes, a per-model setting and a production flag brought by your framework. This sounds very specific to your own current needs, and I can't assume it will also fit other users needs, nor that they will be able to understand the constraints specific to each mode to make sure their code is still working on production.

Moreover, as I already said before, this "full performance mode" implies using the library as a pure type checker, which greatly reduces its interest in the first place. If you are using the lib this way, it would be simpler to not calling model constructor at all based on your isProduction condition:

const om = Meteor.isProduction ? data : new BaseOM(data)

or in a class declaration:

const BaseOMModel = Meteor.isProduction ? Object : ObjectModel({
  createdDate: [Date],
  lastUpdatedDate: [Date],
})

class BaseOM extends BaseOMModel {}

so that you are directly interacting with the initial object. It makes clear what you are actually doing to remove OM perf penalty, and it is the most performant solution since you are not doing even a single call to OM.

The CHECK_ONCE mode can also be done quite simply on your side:

const om = Meteor.isProduction ? BaseOM(data) && data : BaseOM(data)

If you need something more sophisticated than this, writing your own mock for the library is totally feasible as shown in my previous post. And it will be up to you to decide what exactly is doing this mock, to cover exactly your needs at minimal costs.

gotjoshua commented 4 years ago

@sylvainpolletvillard, thanks for the reply and for the elegant syntax suggestions. I think we will try them out and if the investigation finds anything fruitful we'll share it here.

I would also very much like to engage in further optimizations and performance testing. We have a system with many collections of objects and can offer some decent "real world" performance testing opportunities. If you publish some Release Candidates we can easily test them out and give some comparisons.

gotjoshua commented 4 years ago

ah, and one detail question: in this example:

const om = Meteor.isProduction ? BaseOM(data) && data : BaseOM(data)

is there any performance advantage to the following?

const om = Meteor.isProduction ? BaseOM.test(data) && data : BaseOM(data)

sylvainpolletvillard commented 4 years ago

your second proposition will assign om to false if the data doesn't match the model. My first proposition will trigger an exception. This is the only difference

For perf optimizations, it is mainly micro-optimizations that can be measured individually. Having a real-world large project could be useful to get an idea of the function call distribution, but it may vary a lot between projects so I'm not sure if one project can help me take decisions, If there are decisions to be taken in the first place, for now I would like to focus on optimizations only.

crobinson42 commented 3 years ago

v4 https://codesandbox.io/s/youthful-johnson-t7e7c?file=/src/index.js

sylvainpolletvillard commented 3 years ago

An update on this issue and the improvements made on performance:

autocasting implies testing if some data needs to be casted in the first place, and had to be done before the validation step ; it currently has no cache mechanism, meaning that type-checking has to be done twice in many cases. We should probably share intermediate results between the casting and validation phases, or add a cache layer to bypass casting on already casted values.

Every autocasted model now bypass its second validation step

I would still want BaseOM.extend to work, but to behave differently (return a bare object, a check-once object or a full OM proxy)

I added in v4.3.0 a CHECK_ONCE mode as a second argument to object models constructors

const Address = Model({ city: String, street: { name: String, number: Number }})
const a = new Address({ city: "Paris", street: { name: "Champs-Elysees", number: 12 }}, Model.CHECK_ONCE)

this is thought to be applied per-instance, but you can find easily a pattern to apply on all instances of a model, i.e:

class Person extends Model({ name: String, age: Number }) {
    constructor({ name, age }) {
        super({ name, age }, Model.CHECK_ONCE)
    }
}

const john = new Person({ name: "John", age: 32 })

I'm not sure if it makes sense to do this for Array/Map/Set models, I can't think of a usecase where you would want to only validate initial list content and not future additions to the list.

gotjoshua commented 3 years ago

I added in v4.3.0 a CHECK_ONCE mode

nice one! thanks for your persistence! i'll check it out...

sylvainpolletvillard / ObjectModel