repman-io / repman

Repman - PHP Repository Manager: packagist proxy and host for private packages
https://repman.io
MIT License
515 stars 107 forks source link

Performance: read and parse file contents after having send the response headers #537

Open Jeroeny opened 2 years ago

Jeroeny commented 2 years ago

When running composer update (or other commands that read the registry's package data), composer will fetch the p2/<organization>/<package>.json endpoint for every package. It will first read the response headers, to determine if the contents of the response should be downloaded. The contents of the response will only be downloaded if the local composer cache is outdated (the last-modified of the server is later then that of the client's files).
If the cache is up-to-date, the rest of the response will not be downloaded (making the update command faster).

However, server-side, repman always reads and parses the package data. Performance can possibly be improved by first determining the lastModified and ensuring the headers are output before the reading and parsing of the package data.

Maybe I misinterpreted the code and this was already happening, but it didn't seem like it to me. Also this PR is not yet tested, but I'd like to get the proposal confirmed first.

Docs reference:

https://getcomposer.org/doc/05-repositories.md

Caching is done via the use of If-Modified-Since header, so make sure you return Last-Modified headers and that they are accurate.

akondas commented 2 years ago

But lastModified is set for each response, I thought it was enough :thinking:

Can you provide some speed test (before and after) so we can see how it affects performance?

Generally, I am yes for this PR, but before we start polishing it, I would like to know how much it is worth

Jeroeny commented 2 years ago

@akondas It was a bit of a challenge to setup locally, but I've got it now. I've done various tests with both parallel batches of requests and sequential. Though not always consistent, It seems the responsetime is up to ~5% less. E.g. when doing 100 requests, the average response time was 113ms vs 119ms. There could be more gain on setups with worse disk performance. Not sure how significant the difference is. But if a composer update checks 100+ packages, it may add up.

akondas commented 2 years ago

in theory you are right, but I need a little more time to think about it, because the improvment is relatively small, but good job anyway :wink:

Jeroeny commented 2 years ago

@akondas This might have more performance gains for setups with network-mounted disks. I think the change is worth to try. Could you reconsider the PR? I've updated the code to use \fwrite(), like the other StreamResponse implementations do too.

akondas commented 2 years ago

Looks ok, we can give it a try. Do you have any benchmarks that can show how much we gain with this PR?

Jeroeny commented 2 years ago

Looks ok, we can give it a try. Do you have any benchmarks that can show how much we gain with this PR?

I currently do not have a setup to do extensive benchmarks, nor the time to set it up unfortunately.

Jeroeny commented 2 years ago

@akondas I've discovered that Headers were not outputted before the response content yet. The entire response was still buffered. An additional response header and fwrite call fixed this. In our setup where files are on a mounted remote network filesystem, this gives significant performance increase. Otherwise calls to Repman can take more than a second.

It could be even more performant if lastModified was retrieved from this->packageQuery->getAllNames (a db query), so that no files would need to be read to calculate the response headers. But that would be for a separate PR.

I hope you can (re)consider the PR.

codecov[bot] commented 2 years ago

Codecov Report

Merging #537 (ab441a3) into master (50fe808) will increase coverage by 0.00%. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##             master     #537   +/-   ##
=========================================
  Coverage     99.16%   99.16%           
- Complexity     1910     1913    +3     
=========================================
  Files           301      301           
  Lines          6072     6092   +20     
=========================================
+ Hits           6021     6041   +20     
  Misses           51       51           
Impacted Files Coverage Δ
src/Controller/RepoController.php 100.00% <100.00%> (ø)
src/Service/Organization/PackageManager.php 93.54% <100.00%> (+0.56%) :arrow_up:
...curity/PackageScanner/SensioLabsPackageScanner.php 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.