miragejs / discuss

Ask questions, get help, and share what you've built with Mirage
MIT License
2 stars 0 forks source link

Factory performance issues #10

Open samselikoff opened 4 years ago

samselikoff commented 4 years ago

Issues with using factories on larger data sets keeps coming up so I wanted to open an issue and consolidate the conversations so we can get down to the root of the issue.

It probably has something to do with some much older code somewhere, possibly in db.

Ryan was asking about a way to track which factories were taking longest over in https://github.com/miragejs/server/issues/91 and I hacked something together quickly that should give folks a way to do some measurement:

// mirage/config.js
import { Server } from "@miragejs/server";

let originalCreate = Server.prototype.create;
Server.prototype.create = function() {
  console.log("beforeCreate");
  let model = originalCreate.apply(this, arguments);
  console.log("afterCreate");

  return model;
};

export default function() {
  this.get("/users", ...)
}

It would be great to get some small demo projects going so we can start to fix this long-standing issue!

I want to help point anyone in the right direction who can help as every time this comes up something else is always more important.

makepanic commented 4 years ago

If anyone wants to use it with the performance measure api:

const markerA = 'Mirage create start';
const markerB = 'Mirage create end';

let originalCreate = Server.prototype.create;
Server.prototype.create = function(modelName) {
  performance.mark(markerA);
  let model = originalCreate.apply(this, arguments);
  performance.mark(markerB);
  performance.measure(`Mirage create: ${modelName}`, markerA, markerB);

  return model;
};

Open devtools performance tab and record.

Example screenshot from one of our tests. It seems like one issue is faker taking time for some fakes.

20190911-090913

In that one case it seems like faker does some things not that optimal, e.g. https://github.com/Marak/faker.js/blob/3a4bb358614c1e1f5d73f4df45c13a1a7aa013d7/lib/system.js#L118-L125 they generate the list of extensions for each call of fileExt which are ~1860 mimeTypes deeply nested and transformed into an array on every call :( (created https://github.com/Marak/faker.js/issues/822 for that)

20190911-090920

I might be able to create some PRs against faker to remove some faking performance issues.

makepanic commented 4 years ago

After removing faker from the generation it seems like most of the time is spend when collecting names (modelName, toCollectionName, ...)

20190911-090930

It seems like we could memoize the inflection as it shouldn't change for a word at runtime, right?

https://github.com/miragejs/server/blob/f4d76762651fd2b6e1df71f1725d395bf34881ef/lib/utils/inflector.js#L12

@samselikoff ould you be open to memoize camelize and maybe some name retrieval methods? I could create a PR for that.

edit: with simply memoizing camelize and dasherize:

20190911-100935

From a singular run it came down to (= not that representative):

It also seems there are various toKey or toName methods with seems to be pure functions and could be memoized, e.g. toInternalCollectionName.

Running the whole test suite in our primary app changed it from 283771ms to 259246ms. (again singular run so expect inaccuracy)

edit2: i've memoized some more pure Schema functions that do string manipulation (i.e. toModelName, toInternalCollectionName, toCollectionName) and got the following:

20190911-100927

edit3: getForeignKey and _typeIsPluralForModel can also be memoized.

samselikoff commented 4 years ago

Would definitely be open to this sort of PR! I really have very little experience here so please, I would love to hear your ideas.

I think the fastest way to keep us moving is to POC the change in a small PR so I can give feedback over anything I have input on. But definitely interested in this and willing to take your lead :)

And yes those calls should not change at runtime

makepanic commented 4 years ago

I've started working on this in https://github.com/miragejs/server/pull/130

I'd like to check this more in-depth in our app in the next couple days to see if the gains are really that great or I did something wrong. The PR is passing the tests locally and can theoretically already be merged.

makepanic commented 4 years ago

It seems liek there are some other hot-spots. Performance inspector shows me e.g. that _validateForeignKeyExistsInDatabase, _associateWithNewInverses is taking kinda long for some of our tests.

I'll try to see if there are some other easy changes one can do to improve this.

samselikoff commented 4 years ago

These are the kinds of things I suspect are more problematic (and maybe tougher to fix), I think it's very likely they're introducing cycles.

We have good test coverage in the ORM here but this code is tricky. I'd suggest trying to set up a very simple case, an author that has 2 posts, then create a new author and associate one of the posts with the second author. Then track how many times those methods are called and if they're called multiple times for the same model. I suspect they will be.

samselikoff commented 4 years ago

To add some color, Mirage's ORM stores foreign keys on both sides of a relationship (since relationships can be one-way), so

author1.commentIds = [1, 2] comment1.authorId = 1 comment2.authorId = 2

Then if you go and change

comment1.authorId = 2

Mirage has to look at author1 (comment1's old inverse), and check if it has any relationships that correspond to that record, and disassociate that record from it, and ya it can have cascading effects.

Definitely open to refactoring this code & at least getting other folks looking at it!

samselikoff commented 4 years ago

FYI: Transferred this to our Discuss repo, our new home for more open-ended conversations about Mirage!

If things become more concrete + actionable we can create a tracking issue in the main repo.