timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-41722] Job folder loading performance scalability #3839

Open timja opened 7 years ago

timja commented 7 years ago

I have a tree-like structure job layout as below:

"Projects" Cloudbees folder contains 34 subproject folder
  subproject: "Project-A" Cloudbees folder contains 6 branch folders ( 5 multibranch job folder and 1 cloudbees folder )
    multibranch-job1 contains 295 matrix jobs (75 x 2 matrix combination)
    multibranch-job2 contains 317 matrix jobs (75 x 2 matrix combination)
    multibranch-job3 contains 189 matrix jobs (75 x 2 matrix combination)
    multibranch-job4 contains 15 matrix jobs (75 x 2 matrix combination)
    multibranch-job5 contains 5 matrix jobs (75 x 2 matrix combination)
    Cloudbees folder contains 4 matrix jobs (75 x 2 matrix combination)
  subproject: Project-B Cloudbees folder ....
    multibranch-job1 ...
    ... ...
  ... ... ...

I found somehow no matter what kind of authorization strategy I choose,
it's always taking longer time
to open "Project-A" cloudbees folder (contains 6 items)
than "Projects" folder (contains 34 items).

Below is my benchmark:

  Chrome Chrome Firefox Firefox
Authorization Method Open Projects Folder Open Project-A Folder Open Projects Folder Open Project-A Folder
Anyone can do anything 2.10 sec 8.43 sec 5.99 sec 17.05 sec
Logged-in users can do anything 3.97 sec 9.94 sec 7.97 sec 20.06 sec
Initial default Role-based configuration - admin role 8.22 sec 14.54 sec 16.47 sec 28.55 sec
Role-based configuration - admin role with 119 Project roles 18.81 sec 31.95 sec 37.7 sec 65

I'm expecting it should be faster to open "Project-A" folder because it needs to process permission and load only 6 items,
and slower to open "Projects" folder because it needs to process permissions and load for 34 items.


Originally reported by totoroliu, imported from: Job folder loading performance scalability
  • status: Reopened
  • priority: Major
  • resolution: Unresolved
  • imported: 2022/01/10
timja commented 7 years ago

totoroliu:

Does the Jenkins core somehow process more items than needed for Multibranch jobs?
eg.
Projects folder contains 34 cloudbees folder and processed only this 34 items.
Project-A folder contains 6 items,
but somehow Jenkins core processed more to include Multibranch branch jobs (those hundred of matrix jobs).

timja commented 7 years ago

totoroliu:

this issue is somewhat similar to JENKINS-18377 but not identical
because the phenomenon I see apply to not only Role-base strategy
but also to other different authorization.

timja commented 7 years ago

danielbeck:

There's no bug here. It's more expensive to compute the status of matrix jobs as each combination is considered an individual build and must be loaded.

timja commented 7 years ago

totoroliu:

so does that mean Jenkins will always compute the status recursively to the leave of the item?
Then this still doesn't explain why I can open the page quick at higher level ("Project" folder),
which would contain more leave items.

Secondly, I didn't open the matrix job at all.
I only opened "Projects" folders contains 34 folders only,
and opened "Project-A" folders contains 1 folder and 5 multibranch-job (which has folder at top level).
I didn't click to open into multibranch-job to load the matrix branch jobs.

timja commented 7 years ago

samsun387:

danielbeck it doesn't explain why it's taking longer time to open "Project-A" cloudbees folder (contains 6 items) than "Projects" folder (contains 34 items).

timja commented 7 years ago

totoroliu:

danielbeck
Could you also clarify whether Jenkins core always load page with recursive calls of process permission?

timja commented 7 years ago

totoroliu:

I did further a further benchmark testing.
After I put each of 5 of Multibranch jobs into another Cloudbees folder,
somehow the "ProjectA" load time is improved by 7 seconds.

This makes wonder could it be a bug for the multi-branch-project-plugin?
The multi-branch-project job doesn't behave like Cloudbees folder
that preventing the Jenkins core to process permission recursively.
Hence loading a page containing only 1 multi-branch-project item would recursively get to the branch jobs.

If this is true,
then this bug would still be valid for Multi-branch-project job.

Or if it's Jenkins core always process items recursively,
then Jenkins core is process extra un-needed items,
and it'd be a bug.

I'll reopen the bug for now.

timja commented 7 years ago

totoroliu:

I did further a further benchmark testing.
After I put each of 5 of Multibranch jobs into another Cloudbees folder,
somehow the "ProjectA" load time is improved by 7 seconds.

This makes wonder could it be a bug for the multi-branch-project-plugin?
The multi-branch-project job doesn't behave like Cloudbees folder
that preventing the Jenkins core to process permission recursively.
Hence loading a page containing only 1 multi-branch-project item would recursively get to the branch jobs.

If this is true,
then this bug would still be valid for Multi-branch-project job.

Or if it's Jenkins core always process items recursively,
then Jenkins core is process extra un-needed items,
and it'd be a bug.

I'll reopen the bug for now.

timja commented 7 years ago

totoroliu:

Here's the benchmark together with putting multibranch project into another Cloudbees folder:

  Chrome Chrome Chrome Firefox Firefox Firefox
Authorization Method Open Projects Folder Open Project-A Folder (with Multibranch job) Open Project-A Folder (put multibranch into subfolder) Open Projects Folder Open Project-A Folder (with Multibranch job) Open Project-A Folder (put multibranch into subfolder)
Anyone can do anything 2.10 sec 8.43 sec 1.24 5.99 sec 17.05 sec 4.34
Logged-in users can do anything 3.97 sec 9.94 sec 3.43 7.97 sec 20.06 sec 6.93
Initial default Role-based configuration - admin role 8.22 sec 14.54 sec 7.43 16.47 sec 28.55 sec 14.54
Role-based configuration - admin role with 119 Project roles 18.81 sec 31.95 sec 25.03 37.7 sec 65 50.5

From the table,
the benchmark is now making sense:
Open ProjectA with Cloudbees folder only (in theory should process only 6 items without recursive)
now is faster than open Projects folder (in theory should process only 34 items without recursive)

timja commented 7 years ago

totoroliu:

Also i forgot to mention for all of cloudbees folders,
I have removed health metrics
so the view (page) doesn't require any info from the underlying items.

timja commented 7 years ago

danielbeck:

If this is supposed to be bug, please provide references to relevant source code, for example from repeated thread dumps taking during page load.

(I'd rather not start investigating just to find out after several hours that my previous comment was right and there's no bug here.)

timja commented 7 years ago

totoroliu:

While trying to collect the data,
I added debug lines to role-strategy-plugin,
and found out that for every minute,
jenkins is repeatedly calling RoleBasedProjectNamingStrategy.getACL() itself.
even when there's no user interaction.
It's repeatedly doing the same thing everything minute after Jenkins is fully up and running.

Refer to jenkins-starup.log
This is the full log containing the first 5 minutes since Jenkins service started.
Jenkins is fully up and running at "Feb 22, 2017 6:26:26 PM".

Then after 6:26:26,
for every whole minute (6:27:00, 6:28:00, 6:29:00, etc),
it's calling getACL(), regenerate roleMap (RoleMap.newMatchingRoleMap(), etc.
I have separate those log out into separate log files as "jenkins-startup-repeat01-0627.log"

Right now I'm using a test server to collect logs.
Hence, I have only minimal amount of jobs items:
Folder@775f8b29[Build_Steps] contains 3 freestyle jobs:
FreeStyleProject@251a4167[Build_Steps/Chroot_Build]
FreeStyleProject@40694780[Build_Steps/Post_Build]
FreeStyleProject@54a632cd[Build_Steps/Transfer_Artifacts]

Folder@bbc939b[Projects] contains:

Folder@13d4d234[Projects/AC]
MatrixMultiBranchProject@78bace4f[Projects/AC/br_4-1]
MatrixProject@39f19408[Projects/AC/br_4-1/br_4-1_offline_otp]
MatrixProject@308cb740[Projects/AC/br_4-1/br_4-1_oracle]
MatrixMultiBranchProject@3dc34516[Projects/AC/br_4-2]
MatrixProject@d4632cf[Projects/AC/br_4-2/br_4-2_chained_auth]
MatrixProject@201a8451[Projects/AC/br_4-2/br_4-2_cli]
Folder@20c9d46[Projects/AC/release_trunk]
MatrixProject@4fc557c7[Projects/AC/release_trunk/trunk]

Folder@17434e36[Projects/SB]
MatrixMultiBranchProject@71c70c6a[Projects/SB/br_2-3]
MatrixProject@1d5024d4[Projects/SB/br_2-3/br_2-3_dev_fortitracer]
MatrixProject@428d3556[Projects/SB/br_2-3/br_2-3_dev_pkgtools]
MatrixMultiBranchProject@7afa044c[Projects/SB/br_2-4]
MatrixProject@657e8516[Projects/SB/br_2-4/br_2-4_AWS]
Folder@700e5985[Projects/SB/release_trunk]
MatrixProject@49691c7b[Projects/SB/release_trunk/trunk]

Why it's repeatedly doing the same thing every minute?
On production server,
I have much more job items.
could this be the performance impact factor?

timja commented 7 years ago

danielbeck:

Why it's repeatedly doing the same thing every minute?

Create an exception and log the stack trace.

timja commented 7 years ago

totoroliu:

Uploaded file jenkins-startup.tar.gz:

This tarball contains the first 5 minutes of logs (including stack trace) after Jenkins service start-up.
I put the stack trace call inside
RoleBasedAuthorizationStrategy.getACL(AbstractItem)
RoleBasedAuthorizationStrategy.getACL(roleMapName, itemName, roleType, item)

At each round minute,
RoleBasedAuthorizationStrategy.getACL(AbstractItem) was called twice at beginning,
and then RoleBasedAuthorizationStrategy.getACL(roleMapName, itemName, roleType, item)

timja commented 7 years ago

totoroliu:

I also uploaded the file jenkins-open-folder.tar.gz,
which contains the log
when I repeat the benchmark test procedures:

1. Open Project folder (jenkins_open_Projects_folder01.log) - contains 2 folders (AC and SB)
2. Open AC folder (jenkins_open_AC_folder.log)

3. Open Project folder again (jenkins_open_Projects_folder02.log)
4. Open SB folder (jenkins_open_SB_folder.log)

5. Open Project folder again (jenkins_open_Projects_folder03.log)

For each step,
I intentionally avoid to start at round minute to prevent the log been polluted by unknown calling of getACL().
But you could still find from the log that,
after it processed the permission for the required items to display in the current view page,
it continues to process the permission for all of non-required items.

Should I open another JIRA ticket for this?

timja commented 7 years ago

danielbeck:

Should I open another JIRA ticket for this?

Nobody will care.

I looked at some of the stack traces real quick – a lot of code is calling into e.g. Jenkins.getAllItems(…) which – as documented – performs permission checks to see whether the current user has permission to read.

There is still no bug here.

Please note I'm unsubscribing, as this is a waste of my time. Please don't ping me about this.

timja commented 7 years ago

samsun387:

I looked at some of the stack traces real quick – a lot of code is calling into e.g. Jenkins.getAllItems(…) which – as documented – performs permission checks to see whether the current user has permission to read.
There is still no bug here.

Still couldn't answer why it's processing permission for non-required items as Rick's finding indicated. It also doesn't explain why it's repeating the calls every minute.

timja commented 7 years ago

totoroliu:

The whole point here is it shouldn't always call getAllItems().
Back in old days when there is no Folder-plugin, every item is at root level.
It probably makes no difference if you call getAllItems() or getItems().

With folder structure,
Jenkins would need an API similar to getItems() but with the ability to get items at a specified path,
and to return only the immediate children items.

timja commented 7 years ago

totoroliu:

Refer to jenkins_open_AC_folder_Current_Thread.log file.

The Thread "Handling GET /view/All/job/Projects/job/AC/ from 172.18.25.141 : RequestHandlerThread9 (133)" was process item "Projects/AC" at beginning correctly.
Until Thread "Handling GET /view/All/job/Projects/job/AC/ from 172.18.25.141 : RequestHandlerThread9 View/index.jelly View/sidepanel.jelly AbstractFolder/tasks-bottom.jelly (133)"
starts to process item "Build_Steps", which is totally outside of current view context:

INFO: OUT: RoleBasedAuthorizationStrategy.getACL(AbstractItem project=com.cloudbees.hudson.plugins.folder.Folder@2e820a2b[Build_Steps]): start
Feb 24, 2017 6:06:21 PM com.michelin.cio.hudson.plugins.rolestrategy.RoleBasedAuthorizationStrategy getACL
INFO: OUT: RoleBasedAuthorizationStrategy.getACL(AbstractItem project=com.cloudbees.hudson.plugins.folder.Folder@2e820a2b[Build_Steps]): Thread.getAllStackTraces()=
Current Thread: Handling GET /view/All/job/Projects/job/AC/ from 172.18.25.141 : RequestHandlerThread9 View/index.jelly View/sidepanel.jelly AbstractFolder/tasks-bottom.jelly (133)
Thread Handling GET /view/All/job/Projects/job/AC/ from 172.18.25.141 : RequestHandlerThread9 View/index.jelly View/sidepanel.jelly AbstractFolder/tasks-bottom.jelly
at java.lang.Thread.dumpThreads(Native Method)
at java.lang.Thread.getAllStackTraces(Thread.java:1607)
at com.michelin.cio.hudson.plugins.rolestrategy.RoleBasedAuthorizationStrategy.printAllStackTraces(RoleBasedAuthorizationStrategy.java:94)
at com.michelin.cio.hudson.plugins.rolestrategy.RoleBasedAuthorizationStrategy.getACL(RoleBasedAuthorizationStrategy.java:176)
at hudson.model.AbstractItem.getACL(AbstractItem.java:498)
at hudson.model.AbstractItem.hasPermission(AbstractItem.java:512)
at jenkins.model.Jenkins.getItems(Jenkins.java:1719)
at jenkins.model.Jenkins.getItems(Jenkins.java:325)
at hudson.model.Items.getAllItems(Items.java:363)
at hudson.model.Items.getAllItems(Items.java:359)
at jenkins.model.Jenkins.getAllItems(Jenkins.java:1753)
at jenkins.model.Jenkins.getAllItems(Jenkins.java:1762)
at com.cloudbees.hudson.plugins.folder.relocate.StandardHandler.validDestinations(StandardHandler.java:85)
at com.cloudbees.hudson.plugins.folder.relocate.StandardHandler.applicability(StandardHandler.java:54)
at com.cloudbees.hudson.plugins.folder.relocate.DefaultRelocationUI.isAvailable(DefaultRelocationUI.java:67)
at com.cloudbees.hudson.plugins.folder.relocate.RelocationAction.getIconClassName(RelocationAction.java:99)

If the caller function can filter out the un-needed items based upon the path first,
this could save the time to process hasPermission()/getACL for the un-needed items.

timja commented 7 years ago

totoroliu:

And if the total number items (job) is critical to the overall Jenkins performance,
it'd be good if Jenkins could suggest an upper bound limit number.