kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
63 stars 63 forks source link

Metadata editor takes a long time initializing because of many subordinate processes #4505

Closed andre-hohmann closed 3 years ago

andre-hohmann commented 3 years ago

Problem

If a superordinate process with many internal links to the subordinate processes are opened in the metadata editor, it takes a lot of time (up to several minutes), until the editor opens. This affects also newspaper processes on the year level, which can contain links to several hundreds processes.

Maybe the reasons are the same as in:

Solution

It should be examined, if the editor can be opened faster.

matthias-ronge commented 3 years ago

@henning-gerhardt, can you get a log output on the trace level, what happens while you are waiting for the metadata editor to open? There are some time-measuring log outputs in some places to be able to investigate such cases. For example, load file, initialize rule set, search for images, etc.

Maybe you can also check, is there load on the server or is there load on the browser side? I also found that the metadata editor is very slow initializing in the browser when you have few resources on the PC, for example when an MS Teams screen-sharing session is running at the same time.

henning-gerhardt commented 3 years ago

@matthias-ronge Here is the output of 3 processes.

ID: 463843 Volumes: 38

Log output Kitodo.Production:

[TRACE] 2021-06-23 09:25:44,747 [http-nio-127.0.0.1-8080-exec-1] org.kitodo.production.services.data.RulesetService - Reading ruleset took 30 ms
[INFO] 2021-06-23 09:25:44,759 [http-nio-127.0.0.1-8080-exec-1] org.kitodo.production.services.dataformat.MetsService - Reading 463843/meta.xml
[INFO] 2021-06-23 09:25:44,881 [http-nio-127.0.0.1-8080-exec-1] org.kitodo.production.services.dataformat.MetsService - Reading 463843/meta.xml
[TRACE] 2021-06-23 09:25:45,017 [http-nio-127.0.0.1-8080-exec-1] org.kitodo.production.services.file.FileService - Searching for media took 109 ms
[TRACE] 2021-06-23 09:27:36,168 [http-nio-127.0.0.1-8080-exec-1] org.kitodo.production.forms.dataeditor.DataEditorForm - Initializing editor beans took 111149 ms

Search for a volume f.e. ID 184013: 38 times Log output ElasticSearch access:

xxx.xxx.xxx.xxx - - [23/Jun/2021:09:25:46 +0200] "GET /es/kitodo/process/184013 HTTP/1.1" 200 7633 "-" "Apache-HttpAsyncClient/4.1.2 (Java/1.8.0_292)"
...
xxx.xxx.xxx.xxx - - [23/Jun/2021:09:27:34 +0200] "GET /es/kitodo/process/184013 HTTP/1.1" 200 7633 "-" "Apache-HttpAsyncClient/4.1.2 (Java/1.8.0_292)"

Load average on application server: 0.4 on 4 cores Load average on elastic search server: 2.1 on 4 cores Load average on database server: 0.1 on 2 cores


ID: 463826 Volumes: 47

Log output Kitodo.Production:

[TRACE] 2021-06-23 09:28:40,262 [http-nio-127.0.0.1-8080-exec-10] org.kitodo.production.services.data.RulesetService - Reading ruleset took 41 ms
[INFO] 2021-06-23 09:28:40,266 [http-nio-127.0.0.1-8080-exec-10] org.kitodo.production.services.dataformat.MetsService - Reading 463826/meta.xml
[INFO] 2021-06-23 09:28:40,291 [http-nio-127.0.0.1-8080-exec-10] org.kitodo.production.services.dataformat.MetsService - Reading 463826/meta.xml
[TRACE] 2021-06-23 09:28:40,384 [http-nio-127.0.0.1-8080-exec-10] org.kitodo.production.services.file.FileService - Searching for media took 73 ms
[TRACE] 2021-06-23 09:31:22,442 [http-nio-127.0.0.1-8080-exec-10] org.kitodo.production.forms.dataeditor.DataEditorForm - Initializing editor beans took 162058 ms

Search for a volume f.e. ID 168729: 47 times Log output ElasticSearch access:

xxx.xxx.xxx.xxx - - [23/Jun/2021:09:28:43 +0200] "GET /es/kitodo/process/168729 HTTP/1.1" 200 11538 "-" "Apache-HttpAsyncClient/4.1.2 (Java/1.8.0_292)"
...
xxx.xxx.xxx.xxx - - [23/Jun/2021:09:31:22 +0200] "GET /es/kitodo/process/168729 HTTP/1.1" 200 11538 "-" "Apache-HttpAsyncClient/4.1.2 (Java/1.8.0_292)"

Load average on application server: 0.6 on 4 cores Load average on elastic search server: 2.3 on 4 cores Load average on database server: 0.1 on 2 cores


ID: 459945 Volumes: 128

Log output Kitodo.Production:

[TRACE] 2021-06-23 09:31:57,581 [http-nio-127.0.0.1-8080-exec-3] org.kitodo.production.services.data.RulesetService - Reading ruleset took 49 ms
[INFO] 2021-06-23 09:31:57,583 [http-nio-127.0.0.1-8080-exec-3] org.kitodo.production.services.dataformat.MetsService - Reading 459945/meta.xml
[INFO] 2021-06-23 09:31:57,611 [http-nio-127.0.0.1-8080-exec-3] org.kitodo.production.services.dataformat.MetsService - Reading 459945/meta.xml
[TRACE] 2021-06-23 09:31:57,667 [http-nio-127.0.0.1-8080-exec-3] org.kitodo.production.services.file.FileService - Searching for media took 33 ms

Search for a volume f.e. ID 50041: over 145 times (aborted after search amount was over amount of volumes!) Log output ElasticSearch access:

xxx.xxx.xxx.xxx - - [23/Jun/2021:09:15:57 +0200] "GET /es/kitodo/process/50041 HTTP/1.1" 200 99357 "-" "Apache-HttpAsyncClient/4.1.2 (Java/1.8.0_292)"
...
xxx.xxx.xxx.xxx - - [23/Jun/2021:09:47:52 +0200] "GET /es/kitodo/process/50041 HTTP/1.1" 200 99357 "-" "Apache-HttpAsyncClient/4.1.2 (Java/1.8.0_292)"

Connection between browser and server was already closed after 5 minutes as no data was transfered. From Apache error log:

[Wed Jun 23 09:36:57.583975 2021] [proxy_http:error] [pid 20811:tid 140439177905920] (70007)The timeout specified has expired: [client xxx.xxx.xxx.xxx:37844] AH01102: error reading status line from remote server 127.0.0.1:8080, referer: https://xxxxx/kitodo/pages/processes.jsf?keepPagination=true
[Wed Jun 23 09:36:57.584110 2021] [proxy:error] [pid 20811:tid 140439177905920] [client xxx.xxx.xxx.xxx:37844] AH00898: Error reading from remote server returned by /kitodo/pages/processes.jsf, referer: https://xxxxxx/kitodo/pages/processes.jsf?keepPagination=true

Load average on application server: 0.5 on 4 cores Load average on elastic search server: 2.9 on 4 cores Load average on database server: 0.1 on 2 cores


My result:

The first two processes need a long time, the third was aborting as after 5 minutes no data was transfered and even in the elastic search log the search still continue :-(

It looks like that for every volume a search for all volumes is started. Why on third process the search was not stopped after all 128 volumes are searched and the search still continued is a bug too.

matthias-ronge commented 3 years ago

To display the type of child process, the child was loaded from the index, the parent was loaded into the child and all children were loaded into the parent. For each child. Also, every time ruleset was loaded from the index and docket. This can simply be turned off.