websvnphp / websvn

Fork from WebSVN
https://websvnphp.github.io/
GNU General Public License v2.0
134 stars 31 forks source link

WebSVN loads index page with lots of repos very slowly. #78

Open JimmieJimmie opened 5 years ago

JimmieJimmie commented 5 years ago

We are using WebSVN 2.5 with Apache 2.4.37 and have about 120 Repositories and authorization via active directory. WebSVN is very slow! It takes about 50 seconds to load the index page with all repositories (even on the server machine using localhost/websvn)! According to Chrome the TTFB is about 40 second. So what can be the reason for this long response time? The standard svn web page is loaded in about 500 ms.

ams-tschoening commented 5 years ago

Guten Tag JimmieJimmie, am Dienstag, 23. April 2019 um 17:28 schrieben Sie:

We are using WebSVN 2.5 with Apache 2.4.37 and have about 120 Repositories and authorization via active directory.

Authorization or authentication? While that might sound nitpicking, I wonder if WebSVN supports authorization using Active Directory at all and am somewhat sure it doesn't. Instead, authentication is handled by httpd using Active Directory, but authorization should be handled by some shell tool called "svnauthz" using paths to and within the repos.

http://doc.mawan.de/subversion/svn_1.8_releasenotes.html#svnauthz_accessof

include/authz.php

It might make sense to provide the config regarding the paths to your SVN-repos and your access files, if at all. Without access files no authorization should happen at all and therefore "svnauthz" should not be called. If I was you I would use Process Monitor to see what happens during a request.

https://docs.microsoft.com/en-us/sysinternals/downloads/procmon

The standard svn web page is loaded in about 500 ms.

You mean mod_dav_svn?

Mit freundlichen Grüßen,

Thorsten Schöning

-- Thorsten Schöning E-Mail: Thorsten.Schoening@AM-SoFT.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/

Telefon...........05151- 9468- 55 Fax...............05151- 9468- 88 Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

michael-o commented 5 years ago

Cannot confirm. Works lightning fast with mod_auth_gssapi and svnauthz here. Provide more information. @

JimmieJimmie commented 5 years ago

Svn and WebSvn use the same authentication via Apache and Active Directory and the same access file for the subversion repository permissions (authz). Svn works fine (mod_dav_svn), WebSvn is really slow. For the authentication in Apache the modules mod_authnz_sspi and mod_authz_svn are used.

WebSVN configuration in Apache: ` <Location /websvn> AuthName "WebSVN Repository Login"
AuthType SSPI
SSPIAuth On
SSPIAuthoritative On SSPIDomain DomainName

SSPIOmitDomain Off

SSPIOfferBasic On
SSPIPerRequestAuth Off  
SSPIOfferSSPI On   
SSPIBasicPreferred Off

SSPIUsernameCase lower    
Require valid-sspi-user

`

The dialog for entering username and password occurs immediately, than 40 seconds nothing happens and the start / index page of WebSvn loads. What can be taking so long? Are there some additional WebSvn logs?

config.php.txt

michael-o commented 5 years ago

The config doesn't help. Unless you provide some debug logs we cannot help. It is likely an issue with your server, PHP module or something else. Reduce everything to a minimum and retry.

JimmieJimmie commented 5 years ago

It's a clean installation of Wampserver 3.1.7 (64 bit) on a new machine (Windows 10 64 bit, clean installation). Apache serves only Svn and WebSvn. MySQL and Maria DB in Wamp disabled.

Attached part of the access.log file, that contains only one call of "http://localhost/websvn" directly on the server machine with AccessFile disabled. It takes about 18 seconds to process the WebSVN GET request. Calling "http://localhost/svn" takes 4 ms.

access_log_single_websvn.txt websvn_timing_chrome

ams-tschoening commented 5 years ago

Guten Tag JimmieJimmie, am Mittwoch, 24. April 2019 um 07:17 schrieben Sie:

What can be taking so long?

As said before, start with changing your config to not use an access file and additionally use Process Monitor to see what happens during requests.

With an access file svnauthz is called and needs to be available somewhere, so it might get looked up in your PATH and whatever, which might take some time. Do you even have that installed? I'm not even sure where to get it from for Windows, it's e.g. not part of TortoiseSVN by default.

An additional source for waiting time might be your configured "parentPath", because WebSVN is iterating that for all folders checking things to guess which of those are SVN-repos.

configclass.php::ParentPath::findRepository

There e.g. might be some additional, unnecessary deep iteration because of some reason. Some of my own software had similar problems in the past when used with SVN working copies, because it started iterating ".svn"-dirs for no reason, which impacted performance a lot. mod_dav_svn might simply work differently and if you additionally consider AV-software most likely available on your Windows maybe scanning things for WebSVN it doesn't do for mod_dav_svn, some additional milliseconds of overhead easily add up on each other.

All those things would be visible by Process Monitor, that's why I suggest it always in those cases. You simply need to debug this further and the things I mentioned are the first I would have a look at. Waiting time most likely comes from either network, I/O or CPU somewhere and you need to find the bottleneck.

Are there some additional WebSvn logs?

Sadly not, there's only what you see in the webserver logs or what PHP provides independent of any application. Process Monitor is your best choice currently in my opinion, else you need to add debugging statements to WebSVN.

Mit freundlichen Grüßen,

Thorsten Schöning

-- Thorsten Schöning E-Mail: Thorsten.Schoening@AM-SoFT.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/

Telefon...........05151- 9468- 55 Fax...............05151- 9468- 88 Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

michael-o commented 5 years ago

20 seconds of process time seems very very long. Is this due to WAMP? I have never used this stack on Windows and hopefully never will.

ams-tschoening commented 5 years ago

Guten Tag JimmieJimmie, am Mittwoch, 24. April 2019 um 09:19 schrieben Sie:

Attached part of the access.log file, that contains only one call of "http://localhost/websvn" directly on the server machine with AccessFile disabled. It takes about 18 seconds to process the WebSVN GET request.

Which is obviously far less than before, so Process Monitor might tell you where the difference between with/without access file comes from. And at the same time it will tell you what happens during searching for repos to show up.

You can test performance between using parentPath and manually adding all repos in the config as well. Shouldn't bee too difficult even for 120 repos, just print their names on the shell and copy&paste and stuff in your editor.

Mit freundlichen Grüßen,

Thorsten Schöning

-- Thorsten Schöning E-Mail: Thorsten.Schoening@AM-SoFT.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/

Telefon...........05151- 9468- 55 Fax...............05151- 9468- 88 Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

JimmieJimmie commented 5 years ago

Yesterday, before opening this issue, I've already tested to add all of the repositories using $config->addRepository('NameToDisplay', 'file:///c:/Repository Path'); instead of using ParentPath, same slow behaviour. Now I will try the ProcessMonitor and will report later.

michael-o commented 5 years ago

You might want to disable authc and authz and see if this makes a difference. One needs to narrow this down.

JimmieJimmie commented 5 years ago

WebSVN calls "svnauthz" for each repository. According to the ProcessMonitor this takes about 200 ms for single repository. We have 120 repositories, so 120 * 200 ms = 24 seconds. In the ProcessMonitor I can see, that it takes about 100-120 ms from the command call from httpd.exe (cmd.exe /c svnauthz accessof -repository -path -username) till the real execution of svnauthz.exe. Is there any way to speed up these calls?

michael-o commented 5 years ago

That's good news. svnauthz operates on per-repo basis and there is no interface to that what svnauthz does beside in C otherwise one could read this file in memory and reuse it. I don't see any at the moment because reimplementing the entire logic in PHP I would highly discourage.

I just checked the code of mod_*_svn. If I properly understand the stuff, the file is read and then all operations are performed.

Subversion also provides a listing of all repos, how long does this take? Should be way faster.

michael-o commented 5 years ago

At best, you'd raise this question with users@svn.apache.org. @brainy any smart idea how we can solve this better from within PHP?

ams-tschoening commented 5 years ago

WebSVN calls "svnauthz" for each repository.

Where did you get the binary from? Just in case I would like to test myself.

According to the ProcessMonitor this takes about 200 ms for single repository.

This is what I get on UB 16.04 LTS for an example call:

time svnauthz accessof %f/conf/authz --repository=%f --path=/trunk --username=[...]
[...]
real    0m0.009s
user    0m0.003s
sys     0m0.000s

The same for Git-Bash under Windows 10 with most likely disabled Defender:

real    0m0,055s
user    0m0,000s
sys     0m0,015s

Execution in Powershell is of some interest, as it jumps between 26 and ~160 ms:

PS C:\Program Files\CollabNet\Subversion Client> Measure-Command {.\svnauthz accessof "C:\[...]\authz" --repository=[...] --path=/trunk --username=[...]}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 26
Ticks             : 260325
TotalDays         : 3,01302083333333E-07
TotalHours        : 7,23125E-06
TotalMinutes      : 0,000433875
TotalSeconds      : 0,0260325
TotalMilliseconds : 26,0325

PS C:\Program Files\CollabNet\Subversion Client> Measure-Command {.\svnauthz accessof "C:\[...]\authz" --repository=[...] --path=/trunk --username=[...]}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 156
Ticks             : 1562773
TotalDays         : 1,8087650462963E-06
TotalHours        : 4,34103611111111E-05
TotalMinutes      : 0,00260462166666667
TotalSeconds      : 0,1562773
TotalMilliseconds : 156,2773

In the ProcessMonitor I can see, that it takes about 100-120 ms from the command call from httpd.exe (cmd.exe /c svnauthz accessof -repository -path -username) till the real execution of svnauthz.exe. Is there any way to speed this calls up?

Might depend on what causes the delay, check AV-software, network logs of Process Monitor, Thread Creation etc., not only I/O operations. I doubt that reading the file itself into memory is slow, I guess it's either parsing it or process creation itself already. You might want to consider providing the Process Monitor logs to us as well, so that others can have a look.

It doesn't seem like the command can be invoked with multiple different repos and paths once or such, so we need to focus on improving each invocation. Besides being slow, I'm with @michael-o that WebSVN shouldn't reimplement that command (again).

JimmieJimmie commented 5 years ago

I'm using the newest version ot the Collab Subversion. Running svnauthz manually in the command line doesn't take long. It takes long, when the PHP code of WebSVN tries to run the svnauthz via the exec function. Apache/httpd needs some time (100-120 ms) from the call of the exec function till the cmd.exe is invoked. I don't know why it takes so long and how can I speed this up.

michael-o commented 5 years ago

This could be related that PHP launches cmd.exe instead of working directly.

ams-tschoening commented 5 years ago

Apache/httpd needs some time (100-120 ms) from the call of the exec function till the cmd.exe is invoked. I don't know why it takes so long and how can I speed this up.

The delay seems ~ 50 ms for me with svnauthz finishing after additional ~ 30 ms, so per repo it's about 80 - 100 ms in my setup. ~ 40 of the 50 ms seem to be coming from invoking cmd.exe. It might be that the differences in our setup are related to things like AV-software, I've disabled the Defender, and even differences in plain CPU-speed. I'm using the following:

https://ark.intel.com/content/www/de/de/ark/products/75131/intel-core-i7-4900mq-processor-8m-cache-up-to-3-80-ghz.html

Additionally there's the following mentioned in the docs:

bypass_shell (windows only): bypass cmd.exe shell when set to TRUE

https://php.net/manual/en/function.proc-open.php

That is successfully preventing usage of cmd.exe for invocations like the following:

    $resource       = proc_open($cmd, $descriptorspec, $pipes,
                            null, null, array('bypass_shell' => TRUE));

This additionally comes in handy to fix #75. In that issue I already combined calls to proc_open into one place, so we only need to add this option there.

@JimmieJimmie

I suggest editing command.php::runCommand and give the mentioned option a try.

ams-tschoening commented 5 years ago

I've pushed a new branch ghi_75_proc_open_quotes, would be great if someone could test this. The following screenshot shows that no intermediate cmd.exe is created anymore, the Conhost.exe is necessary because svnauthz is a console application and can't be worked around further in my opinion.

Clipboard01

ams-tschoening commented 5 years ago

@JimmieJimmie, you should have a look at the following config as well:

// By default, WebSVN displays the a form to select an other repository. // If you have a lot of repositories this slows done the script considerably. // To disable that uncomment this line.

// $config->setShowRepositorySelectionForm(false);

It seems that the index page even executes svnauthz twice per repo in the end: Once for deciding if a repo should be displayed at the index page and once again for creating the selection form, even if it might not be displayed at all at the index page.

index.php:

// Create listing of all configured projects (includes groups if they are used).
foreach ($projects as $project) {
    if (!$project->hasReadAccess('/'))
        continue;

setup.php:

foreach ($config->getRepositories() as $repository) {
    if ($repository->hasReadAccess('/')) {

Two workarounds: Detect that the current request is index.php and avoid creating the form OR maybe cache the results of hasReadAccess per path some small amount of time in the repo-instance. The former means that no template will ever be able to use the form and the latter might be some security risk, even though things like caching it some seconds or a minute might not be that bad. Additionally, this might speed-up additional checks I didn't recognize yet and is somewhat compatible with persistent PHP executions.

@michael-o You changed implementation of multiviews, could you provide me some example URL of how a call to the index page looks with that? I don't use multiviews and with that checking for index.php is easy. Just want to know if something like that is needed for /browse/ as well.

michael-o commented 5 years ago

@ams-tschoening Yes, sure will do this on Monday as soon as I get access to the server.

ams-tschoening commented 5 years ago

@JimmieJimmie With merging PR #80, things should have improved for you a bit at least, because formerly double-checked permissions on repos are checked only once now. Please give the current master a try.

The following comment in the PR might be of interest for you as well:

"C:/Program Files/CollabNet/Subversion Client\svn" --non-interactive --config-dir /tmp/websvn log --xml --quiet --limit 1 "file:///C:/Users/[...]/Src/Alda/@"

https://github.com/websvnphp/websvn/pull/80#issuecomment-497007235

Besides that, I don't see how the current approach could be improved any further easily, as process creation is somewhat slow on Windows.

michael-o commented 5 years ago

The option I see is to extend this with the svnauthz internals and use the PHP--C binding.

ams-tschoening commented 5 years ago

How can things be improved further?

Global cache independent from requests.

A cache per request with a short period of time has been added to not need to check the same repos in the same request over and over again. This could be enhanced at least for persistent PHP environments to span multiple requests, so that only the first access check is done always. To make changes to authz to take effect, a restart of the web server would be needed. In many cases this would still be acceptable.

https://github.com/websvnphp/websvn/commit/8821af1dacc0394bdf660bdf34f187bd19f693a6#diff-5b6bdb07f82e491a5daf1f78b8afae0e1d995fa67a06f957e467a29460f404b1R100

Concurrent processing of repos.

Creating the index page and selection form ends up to be a large loop simply iterating all repos, so it should be possible to concurrently process multiple repos using multiple CPUs. There seem to be some approaches available in PHP to do so:

https://stackoverflow.com/questions/2101640/patterns-for-php-multi-processes https://medium.com/async-php/multi-process-php-94a4e5a4be05

Executing some command and piping the result to some file to be further read might make sense in both cases, as the overall result per project needed seems to be some HTML. That might even be incorporated into the already available runCommand.

Server push/multipart.

Depending on the browser, Bugzilla does a server push using a multipart document to provide textual content before searches have finished:

my $serverpush
  = $format->{'extension'} eq "html"
  && exists $ENV{'HTTP_USER_AGENT'}
  && $ENV{'HTTP_USER_AGENT'} =~ /(Mozilla.[3-9]|Opera)/
  && $ENV{'HTTP_USER_AGENT'} !~ /compatible/i
  && $ENV{'HTTP_USER_AGENT'} !~ /(?:WebKit|Trident|KHTML)/
  && !defined($cgi->param('serverpush')) || $cgi->param('serverpush');

https://github.com/bugzilla/bugzilla/blob/5.0/buglist.cgi#L112

  print $cgi->multipart_init();
  print $cgi->multipart_start(-type => 'text/html');

  # Generate and return the UI (HTML page) from the appropriate template.
  $template->process("list/server-push.html.tmpl", $vars)
    || ThrowTemplateError($template->error());

https://github.com/bugzilla/bugzilla/blob/5.0/buglist.cgi#L730

AJAX-approach.

Another approach might be reworking index page and repo selector to load data dynamically using JavaScript. This way users would start to see things quickly and how progress is made. I guess especially the repo selection form could be filled in the background without too many users noticing at all.

svnauthz + PHP-C-Binding

The option I see is to extend this with the svnauthz internals and use the PHP--C binding.

https://github.com/websvnphp/websvn/issues/78#issuecomment-498199128

michael-o commented 5 years ago

The svnauthz code is not overly complex, but my PHP knowledge way too low to create that shim.

ams-tschoening commented 5 years ago

Would that really help anyway? If I'm understanding correctly, you mention to extend an already available extension. But besides actual coding, the changes need to be accepted by the current maintainers, need to be available for different OS-builds of PHP in various versions etc. This sounds at least as difficult and out-of-control as the actual coding.

michael-o commented 5 years ago

Maybe, but I consider a native binding to PHP as the best solution. The module is loaded once into memory.

LordAro commented 3 years ago

So I've run into this myself, but with log.php. I tend to keep a browser tab /websvn/log.php?repname=thesvnrepo&path=%2F&isdir=1&er=1&max=200&search=&all=1 open and refresh it occasionally.

However, after switching to this fork (from something based on the old svn), this page is extremely slow to load - ~50s. I've narrowed this slow down to the svnauthz calls. The cache doesn't seem to help with this either - an immediate refresh just takes almost as long to load.

I don't think it's doing any particularly unnecessary svnauthz calls, just a lot of them and they all add up. (Not counting the fact that this particular repo doesn't have any access controls and any if they've got access to '/' they've got access to everything)

The original accessfile stuff (removed in #37) worked "fine" (I recall having to add a patch to deal with groups being used before they were defined, but that was a small fix), and now it seems you're considering effectively readding the code anyway?

ams-tschoening commented 3 years ago

The cache doesn't seem to help with this either - an immediate refresh just takes almost as long to load.

The cache is per request/instance and for a short period of time only. If you have some persistent PHP environment, in theory this could be enhanced using some static global variable to persist across different requests.

https://github.com/websvnphp/websvn/commit/8821af1dacc0394bdf660bdf34f187bd19f693a6#diff-5b6bdb07f82e491a5daf1f78b8afae0e1d995fa67a06f957e467a29460f404b1R100