puli / issues

The Puli issue tracker.
90 stars 5 forks source link

Performance issues #192

Open mnapoli opened 8 years ago

mnapoli commented 8 years ago

I'm running Puli out of the box in production (http://externals.io/). Is there something I should do to improve performances? (e.g. something like composer dumpautoload -o, etc.)

Right now on a simple request 30% of the request time is spent in AbstractJsonRepository::find() and 24% in JsonValidator::validate() -> I guess JSON validation is something that can be spared when running in production. 30% is enormous for such a simple application (here is the Blackfire profile: https://blackfire.io/profiles/759c5623-3a1a-49d6-957d-a15263ebf764/graph). I was thinking maybe there could be a "cached" implementation of ResourceRepository? (I couldn't find one)

Anyway what I'm asking basically is:

thewilkybarkid commented 8 years ago

It's a bit hidden, but you can add:

"config": {
    "repository": {
        "optimize": true
    }
}

to your puli.json.

(I only found this by looking through Puli\Manager\Api\Config\Config.)

mnapoli commented 8 years ago

Thanks, do you know what will it do and why it's not enabled by default?

thewilkybarkid commented 8 years ago

It seems to make it use Puli\Repository\OptimizedJsonRepository rather than Puli\Repository\JsonRepository, which avoids the validation etc (JsonRepository seems have optional validation, but there doesn't seem to be a way to turn it off locally).

It's worked fine for me. Only noticeable difference is that adding a new resource locally etc won't be recognised until the repo is rebuild (eg on a composer install).

tgalopin commented 8 years ago

The OptimizedJsonRepository resolves fully resources and their children on writing (ie when you build) where the JsonRepository resolves them at runtime (on get). The JsonRepository is therefore useful for development but should not be used in production.

The algorithm to resolve a resource on get is quite complex as it has to deal with multiple paths possibilities and find the first matching. For instance, if you have two mappings:

When you ask for resource /app/config/test.yml, Puli has to check for /project/src/config/test.yml and /project/res/config/test.yml. It become more complex as soon as you introduces links, parents, etc. That's why it's quite slow and not performant.

On the other hand, the OptimizedJsonRepository has a simple task: for every mapping, on build, it find reculersively all the children of the parent resource and map them in a big array. On get, it's very easy and quick to find the resource.

You can find the JsonRepository algorithm here: https://github.com/puli/repository/blob/1.0/src/JsonRepository.php#L489

And the Optimized one here: https://github.com/puli/repository/blob/1.0/src/OptimizedJsonRepository.php#L233

mnapoli commented 8 years ago

OK thanks, the way I understand it is it works like composer dumpautoload -o (i.e. using the classmap instead of going through all possible paths).

Given that, wouldn't it make sense for it to be an argument for puli build? (just like composer) Because puli.json stores configuration both for dev and prod, how are we supposed to switch that flag if puli.json is versioned?

tgalopin commented 8 years ago

That's an interesting question. I have to admit that I worked mainly on the raw components (puli/repository and webmozart/* repositories) so I'm probably not able to answer you correctly. Ping @webmozart ?

webmozart commented 8 years ago

@mnapoli The problem is that Puli needs that information not only during build, as far as I remember. The solution there would be to add another config key, config-prod, where you can add configuration for your production environment.

In any case, could you open a new issue?

mnapoli commented 8 years ago

@webmozart I'm not sure I understand: how would I tell Puli "I'm running in production" or "I'm running in development"?

webmozart commented 8 years ago

By passing a --prod to the respective commands, e.g. puli build --prod. The difference woul dbe that you could also change other configuration variables for the production environment, not just the repository type.

TBH I did not think this through very far yet. If you have a good solution for the environment problem I'm all ears.

mnapoli commented 8 years ago

Ah I see, I guess you could want to store assets on S3 in production for example? In that case indeed it's not as simple as Composer I guess.

I don't know enough the internals of Puli and all features available, but as a user if there was a way to include Puli with some dev/prod flag that could work? Or configure different environments, it could work too. That way the framework/application could instantiate Puli depending on the current environment, and puli build would generate both versions of Puli (dev/prod). That way no effort is required from the user, and it works with apps/frameworks that want to support it.

How does that sound?