odpi / egeria

Egeria core
https://egeria-project.org
Apache License 2.0
807 stars 261 forks source link

Perfomance for serial and parallel GET requests #5506

Closed Arturkad closed 3 years ago

Arturkad commented 3 years ago

We faced a poor performance creating and getting ValidValueDeffinitions via Digital Architecture OMAS. Why the parallel requests were done in 22 sec, not in 5 sec as serial requests? And what could be done to improve the total performance on the creation and requests?

Test

Created a ValidValuesSet with 50 ValidValueDeffinitions (took about 1 sec per each POST). Then there were made several requests to get the set with all the definitions. Firstly, we have sent 5 serial (consistent) requests one by one and the time measured was 5 sec per each GET. Secondly, we have sent the same requests simultaneously in a parallel. And the response was delayed till all five were ready, it took about 22 sec.

Environment

The test was performed on a testing virtual machine with 4 cores CPU and 8 Gb RAM, one Egeria platform, and one OMAG server onboard. We use Egeria OMAG Server Platform (version 2.10) Attached file with the server configuration document, it is a Json, but had to change the type to txt. configuration uad2 response.txt

These are the test results:

[10:55:48 - 10:55:49] createValidValueSet|896.4 ms [10:55:49 - 10:55:50] createValidValueDefinition|0|1.001 s [10:55:50 - 10:55:51] createValidValueDefinition|1|1.169 s [10:55:51 - 10:55:52] createValidValueDefinition|2|1.161 s [10:55:52 - 10:55:54] createValidValueDefinition|3|1.165 s [10:55:54 - 10:55:55] createValidValueDefinition|4|1.241 s [10:55:55 - 10:55:56] createValidValueDefinition|5|1.165 s [10:55:56 - 10:55:58] createValidValueDefinition|6|2.176 s [10:55:58 - 10:55:59] createValidValueDefinition|7|1.189 s [10:55:59 - 10:56:00] createValidValueDefinition|8|1.186 s [10:56:00 - 10:56:02] createValidValueDefinition|9|1.132 s [10:56:02 - 10:56:03] createValidValueDefinition|10|1.217 s [10:56:03 - 10:56:04] createValidValueDefinition|11|1.192 s [10:56:04 - 10:56:05] createValidValueDefinition|12|1.231 s [10:56:05 - 10:56:06] createValidValueDefinition|13|1.172 s [10:56:06 - 10:56:08] createValidValueDefinition|14|1.168 s [10:56:08 - 10:56:09] createValidValueDefinition|15|1.147 s [10:56:09 - 10:56:10] createValidValueDefinition|16|1.137 s [10:56:10 - 10:56:11] createValidValueDefinition|17|1.203 s [10:56:11 - 10:56:12] createValidValueDefinition|18|1.167 s [10:56:12 - 10:56:14] createValidValueDefinition|19|1.343 s [10:56:14 - 10:56:15] createValidValueDefinition|20|1.163 s [10:56:15 - 10:56:16] createValidValueDefinition|21|1.146 s [10:56:16 - 10:56:17] createValidValueDefinition|22|1.139 s [10:56:17 - 10:56:18] createValidValueDefinition|23|1.120 s [10:56:18 - 10:56:19] createValidValueDefinition|24|1.149 s [10:56:19 - 10:56:20] createValidValueDefinition|25|1.174 s [10:56:20 - 10:56:22] createValidValueDefinition|26|1.155 s [10:56:22 - 10:56:23] createValidValueDefinition|27|1.138 s [10:56:23 - 10:56:24] createValidValueDefinition|28|1.191 s [10:56:24 - 10:56:25] createValidValueDefinition|29|1.140 s [10:56:25 - 10:56:26] createValidValueDefinition|30|1.150 s [10:56:26 - 10:56:27] createValidValueDefinition|31|1.133 s [10:56:27 - 10:56:29] createValidValueDefinition|32|1.989 s [10:56:29 - 10:56:32] createValidValueDefinition|33|2.642 s [10:56:32 - 10:56:35] createValidValueDefinition|34|2.636 s [10:56:35 - 10:56:37] createValidValueDefinition|35|2.617 s [10:56:37 - 10:56:40] createValidValueDefinition|36|2.673 s [10:56:40 - 10:56:43] createValidValueDefinition|37|2.621 s [10:56:43 - 10:56:45] createValidValueDefinition|38|2.686 s [10:56:45 - 10:56:46] createValidValueDefinition|39|1.164 s [10:56:46 - 10:56:48] createValidValueDefinition|40|1.099 s [10:56:48 - 10:56:49] createValidValueDefinition|41|1.159 s [10:56:49 - 10:56:50] createValidValueDefinition|42|1.163 s [10:56:50 - 10:56:51] createValidValueDefinition|43|1.123 s [10:56:51 - 10:56:52] createValidValueDefinition|44|1.103 s [10:56:52 - 10:56:53] createValidValueDefinition|45|1.101 s [10:56:53 - 10:56:54] createValidValueDefinition|46|1.218 s [10:56:54 - 10:56:56] createValidValueDefinition|47|1.095 s [10:56:56 - 10:56:57] createValidValueDefinition|48|1.140 s [10:56:57 - 10:56:58] createValidValueDefinition|49|1.205 s

[10:56:58 - 10:57:03] getValidValueSetMembers|0|consistent|5.004 s [10:57:03 - 10:57:08] getValidValueSetMembers|1|consistent|4.945 s [10:57:08 - 10:57:13] getValidValueSetMembers|2|consistent|4.795 s [10:57:13 - 10:57:18] getValidValueSetMembers|3|consistent|5.005 s [10:57:18 - 10:57:22] getValidValueSetMembers|4|consistent|4.791 s [10:56:58 - 10:57:22] getValidValueSetMembers|full|consistent|24.54 s

[10:57:22 - 10:57:44] getValidValueSetMembers|3|parallel|21.27 s [10:57:22 - 10:57:44] getValidValueSetMembers|4|parallel|21.53 s [10:57:22 - 10:57:44] getValidValueSetMembers|1|parallel|21.57 s [10:57:22 - 10:57:45] getValidValueSetMembers|2|parallel|22.93 s [10:57:22 - 10:57:45] getValidValueSetMembers|0|parallel|23.05 s [10:57:22 - 10:57:45] getValidValueSetMembers|full|parallel|23.06 s

mandy-chessell commented 3 years ago

The repository involved is the grpah repository with an elastic search index. I think the CQL settings suggest this is using cassandra as the persistence storage.

   "localRepositoryLocalConnection": {
          "class": "Connection",
          "headerVersion": 0,
          "connectorType": {
            "class": "ConnectorType",
            "headerVersion": 0,
            "type": {
              "class": "ElementType",
              "headerVersion": 0,
              "elementOrigin": "LOCAL_COHORT",
              "elementVersion": 0,
              "elementTypeId": "954421eb-33a6-462d-a8ca-b5709a1bd0d4",
              "elementTypeName": "ConnectorType",
              "elementTypeVersion": 1,
              "elementTypeDescription": "A set of properties describing a type of connector."
            },
            "guid": "9c6b3198-ccef-4644-af13-2789646f4233",
            "qualifiedName": "OMRS Graph Repository Connector",
            "displayName": "OMRS Graph Repository Connector",
            "description": "OMRS Repository Connector that uses graph repository store.",
            "connectorProviderClassName": "org.odpi.openmetadata.adapters.repositoryservices.graphrepository.repositoryconnector.GraphOMRSRepositoryConnectorProvider"
          },
          "configurationProperties": {
            "storage.cql.keyspace": "uad_test",
            "storage.username": "uad_test_user",
            "storage.backend": "cql",
            "index.search.hostname": "172.22.100.225",
            "storage.hostname": "172.22.100.211",
            "storage.password": "***",
            "index.search.elasticsearch.http.auth.basic.password": "***",
            "index.search.elasticsearch.http.auth.type": "basic",
            "index.search.elasticsearch.http.auth.basic.username": "elastic",
            "index.search.backend": "elasticsearch"
          }
        }
planetf1 commented 3 years ago

Interesting results - useful feedback. Out of interest, what do you see if you use the 'in-memory' repository? This is clearly unsuitable for any real-world application, but I'm intrigued to see if the local graph repo is the bottleneck (though that difference is likely more visible for writes).

Additionally, and more usefully, there is work underway on a new repository using 'crux' which is looking very promising (ie see https://odpi.github.io/egeria-connector-crux/cts/2.11/2.11/ )

Others in the team maybe able to comment on this particular API/efficiency.

Arturkad commented 3 years ago

@mandy-chessell There are two stands (dev, test) and both use the same Cassandra repository. I attached a schematic of the current deployment. deployment scheme

wbittles commented 3 years ago

Hi , I'd need some further information in order to help

Under "Test" section in the issue description it states , "Secondly, we have sent the same requests simultaneously in a parallel. And the response was delayed till all five were ready, it took about 22 sec"

however when I look at the results the timestamps align exactly , the 2nd request wasn't sent until the first request returned.

10:56:58 - 10:57:03] getValidValueSetMembers|0|consistent|5.004 s [10:57:03 - 10:57:08] getValidValueSetMembers|1|consistent|4.945 s

again the below request wasn't sent until the previous request returned.

[10:57:08 - 10:57:13] getValidValueSetMembers|2|consistent|4.795 s [10:57:13 - 10:57:18] getValidValueSetMembers|3|consistent|5.005 s [10:57:18 - 10:57:22] getValidValueSetMembers|4|consistent|4.791 s [10:56:58 - 10:57:22] getValidValueSetMembers|full|consistent|24.54 s

This implies that the requests are actually being sent synchronously ?

Arturkad commented 3 years ago

@wbittles Hello! There were two sets of requests. Each set contains 5 similar GET requests. In the first set, we sent all requests synchronously (one by one), and got a 5-sec delay per each. So totally for 5 of them, we've spent 24.54 sec.

In the second set, we sent all the five requests simultaneously (in parallel), and got a delay of 21.27 sec, till the first reply came in (with the same amount of data, that used to be returned back in about 5 sec on the first go).

[10:56:58 - 10:57:03] getValidValueSetMembers|0|consistent|5.004 s [10:57:03 - 10:57:08] getValidValueSetMembers|1|consistent|4.945 s [10:57:08 - 10:57:13] getValidValueSetMembers|2|consistent|4.795 s [10:57:13 - 10:57:18] getValidValueSetMembers|3|consistent|5.005 s [10:57:18 - 10:57:22] getValidValueSetMembers|4|consistent|4.791 s [10:56:58 - 10:57:22] getValidValueSetMembers|full|consistent|24.54 s

[10:57:22 - 10:57:44] getValidValueSetMembers|3|parallel|21.27 s [10:57:22 - 10:57:44] getValidValueSetMembers|4|parallel|21.53 s [10:57:22 - 10:57:44] getValidValueSetMembers|1|parallel|21.57 s [10:57:22 - 10:57:45] getValidValueSetMembers|2|parallel|22.93 s [10:57:22 - 10:57:45] getValidValueSetMembers|0|parallel|23.05 s [10:57:22 - 10:57:45] getValidValueSetMembers|full|parallel|23.06 s

Arturkad commented 3 years ago

@planetf1 we have done the same test on a lab stand with in-memory repo. On cocoMDSx OMAG server. Faster compared to Cassandra repository in creating and reading. But still, parallel requests set is much longer compared to individual serial (consistent) requests...

[16:19:27 - 16:19:28] createValidValueSet|823.0 ms [16:19:28 - 16:19:29] createValidValueDefinition|0|1.016 s [16:19:29 - 16:19:30] createValidValueDefinition|1|976.0 ms [16:19:30 - 16:19:31] createValidValueDefinition|2|1.133 s [16:19:31 - 16:19:32] createValidValueDefinition|3|989.1 ms [16:19:32 - 16:19:33] createValidValueDefinition|4|1.051 s [16:19:33 - 16:19:34] createValidValueDefinition|5|959.3 ms [16:19:34 - 16:19:35] createValidValueDefinition|6|970.4 ms [16:19:35 - 16:19:36] createValidValueDefinition|7|1.131 s [16:19:36 - 16:19:37] createValidValueDefinition|8|933.6 ms [16:19:37 - 16:19:38] createValidValueDefinition|9|849.1 ms [16:19:38 - 16:19:39] createValidValueDefinition|10|896.8 ms [16:19:39 - 16:19:40] createValidValueDefinition|11|924.6 ms [16:19:40 - 16:19:41] createValidValueDefinition|12|959.7 ms [16:19:41 - 16:19:42] createValidValueDefinition|13|1.013 s [16:19:42 - 16:19:43] createValidValueDefinition|14|1.213 s [16:19:43 - 16:19:44] createValidValueDefinition|15|1.012 s [16:19:44 - 16:19:45] createValidValueDefinition|16|885.3 ms [16:19:45 - 16:19:46] createValidValueDefinition|17|953.5 ms [16:19:46 - 16:19:47] createValidValueDefinition|18|1.084 s [16:19:47 - 16:19:48] createValidValueDefinition|19|1.058 s [16:19:48 - 16:19:49] createValidValueDefinition|20|1.007 s [16:19:49 - 16:19:50] createValidValueDefinition|21|1.068 s [16:19:50 - 16:19:51] createValidValueDefinition|22|1.091 s [16:19:51 - 16:19:52] createValidValueDefinition|23|921.4 ms [16:19:52 - 16:19:53] createValidValueDefinition|24|876.1 ms [16:19:53 - 16:19:54] createValidValueDefinition|25|1.082 s [16:19:54 - 16:19:55] createValidValueDefinition|26|874.3 ms [16:19:55 - 16:19:56] createValidValueDefinition|27|962.0 ms [16:19:56 - 16:19:57] createValidValueDefinition|28|712.6 ms [16:19:57 - 16:19:57] createValidValueDefinition|29|876.5 ms [16:19:57 - 16:19:58] createValidValueDefinition|30|876.4 ms [16:19:58 - 16:20:00] createValidValueDefinition|31|1.181 s [16:20:00 - 16:20:01] createValidValueDefinition|32|1.029 s [16:20:01 - 16:20:02] createValidValueDefinition|33|962.2 ms [16:20:02 - 16:20:02] createValidValueDefinition|34|802.8 ms [16:20:02 - 16:20:03] createValidValueDefinition|35|871.1 ms [16:20:03 - 16:20:04] createValidValueDefinition|36|943.6 ms [16:20:04 - 16:20:05] createValidValueDefinition|37|873.3 ms [16:20:05 - 16:20:06] createValidValueDefinition|38|863.3 ms [16:20:06 - 16:20:07] createValidValueDefinition|39|894.5 ms [16:20:07 - 16:20:08] createValidValueDefinition|40|939.1 ms [16:20:08 - 16:20:08] createValidValueDefinition|41|752.4 ms [16:20:08 - 16:20:09] createValidValueDefinition|42|628.5 ms [16:20:09 - 16:20:10] createValidValueDefinition|43|741.8 ms [16:20:10 - 16:20:11] createValidValueDefinition|44|918.9 ms [16:20:11 - 16:20:12] createValidValueDefinition|45|920.1 ms [16:20:12 - 16:20:12] createValidValueDefinition|46|793.8 ms [16:20:12 - 16:20:14] createValidValueDefinition|47|1.183 s [16:20:14 - 16:20:15] createValidValueDefinition|48|868.2 ms [16:20:15 - 16:20:16] createValidValueDefinition|49|1.013 s

[16:20:16 - 16:20:22] getValidValueSetMembers|0|consistent|6.242 s [16:20:22 - 16:20:27] getValidValueSetMembers|1|consistent|5.569 s [16:20:27 - 16:20:33] getValidValueSetMembers|2|consistent|5.290 s [16:20:33 - 16:20:38] getValidValueSetMembers|3|consistent|5.199 s [16:20:38 - 16:20:44] getValidValueSetMembers|4|consistent|6.052 s [16:20:16 - 16:20:44] getValidValueSetMembers|full|consistent|28.36 s

[16:20:44 - 16:20:54] getValidValueSetMembers|1|parallel|9.661 s [16:20:44 - 16:20:54] getValidValueSetMembers|2|parallel|10.04 s [16:20:44 - 16:20:54] getValidValueSetMembers|3|parallel|10.12 s [16:20:44 - 16:20:54] getValidValueSetMembers|0|parallel|10.17 s [16:20:44 - 16:20:54] getValidValueSetMembers|4|parallel|10.32 s [16:20:44 - 16:20:54] getValidValueSetMembers|full|parallel|10.33 s

planetf1 commented 3 years ago

Interesting

planetf1 commented 3 years ago

Another question in relation to your topology....

For the timings you mention, are these responses from Egeria running on 9443 or does it include the times through the other layers including Nginx?

If others are included, have you eliminated any possibility of delays or bottlenecks in other layers?

Arturkad commented 3 years ago

@planetf1 For now, the test was run on a developer PC, that is in the same network with Egeria server. The code I should check if is ok to share, if necessary, I think we can. About the layers - there are some (connection between my PC and the server, for instance), but it doesn't explain why parallel requests do not work as they should - they are slower than serial requests. We are using Java API and there is a really low load on a server. I attached a screenshot. image

planetf1 commented 3 years ago

Thanks - it's just a case of eliminating some of the simpler possibilities. My feeling now is to do some profiling/log at logs in more depth & get a feeling of the dynamics here, so when you are able to share I'll give that a go.

Arturkad commented 3 years ago

perfomance test.zip @planetf1 Here is the source test code.

planetf1 commented 3 years ago

Thanks for the sources. I have them building and will need to setup the environment/properties - likely tomorrow.

planetf1 commented 3 years ago

I was able to run the test locally :

Observations:

planetf1 commented 3 years ago

Run again, this time from CLI :

    [21:57:45 - 21:57:45] createValidValueSet|276.6 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|0|35.48 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|1|39.95 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|2|31.20 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|3|40.75 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|4|31.27 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|5|34.63 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|6|31.19 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|7|33.28 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|8|31.69 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|9|28.96 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|10|24.85 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|11|26.16 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|12|32.63 ms
    [21:57:45 - 21:57:45] createValidValueDefinition|13|31.62 ms
    [21:57:45 - 21:57:46] createValidValueDefinition|14|42.79 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|15|32.78 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|16|31.62 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|17|29.29 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|18|29.45 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|19|25.87 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|20|25.50 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|21|22.52 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|22|25.47 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|23|30.32 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|24|39.75 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|25|41.21 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|26|40.09 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|27|36.22 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|28|32.35 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|29|36.41 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|30|34.80 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|31|29.92 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|32|30.88 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|33|31.04 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|34|30.34 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|35|31.07 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|36|32.10 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|37|25.88 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|38|26.62 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|39|37.19 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|40|32.30 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|41|35.84 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|42|36.63 ms
    [21:57:46 - 21:57:46] createValidValueDefinition|43|39.72 ms
    [21:57:46 - 21:57:47] createValidValueDefinition|44|32.59 ms
    [21:57:47 - 21:57:47] createValidValueDefinition|45|34.98 ms
    [21:57:47 - 21:57:47] createValidValueDefinition|46|28.60 ms
    [21:57:47 - 21:57:47] createValidValueDefinition|47|30.66 ms
    [21:57:47 - 21:57:47] createValidValueDefinition|48|33.77 ms
    [21:57:47 - 21:57:47] createValidValueDefinition|49|31.35 ms
    [21:57:47 - 21:57:47] getValidValueSetMembers|0|consistent|82.89 ms
    [21:57:47 - 21:57:47] getValidValueSetMembers|1|consistent|47.80 ms
    [21:57:47 - 21:57:47] getValidValueSetMembers|2|consistent|36.50 ms
    [21:57:47 - 21:57:47] getValidValueSetMembers|3|consistent|33.22 ms
    [21:57:47 - 21:57:47] getValidValueSetMembers|4|consistent|40.08 ms
    [21:57:47 - 21:57:47] getValidValueSetMembers|all|consistent|243.4 ms
    [21:57:47 - 21:57:47] getValidValueSetMembers|4|parallel|50.62 ms
    [21:57:47 - 21:57:47] getValidValueSetMembers|3|parallel|50.66 ms
    [21:57:47 - 21:57:47] getValidValueSetMembers|2|parallel|51.30 ms
    [21:57:47 - 21:57:47] getValidValueSetMembers|0|parallel|51.99 ms
    [21:57:47 - 21:57:47] getValidValueSetMembers|1|parallel|57.80 ms
    [21:57:47 - 21:57:47] getValidValueSetMembers|all|parallel|63.66 ms

The previous set may have been slower due to the intellij environment I ran it in - this was a straight './gradlew build' from the cli

planetf1 commented 3 years ago

I have also done two runs with the local graph repo:

    [07:44:29 - 07:44:30] createValidValueSet|1.274 s
    [07:44:30 - 07:44:31] createValidValueDefinition|0|1.439 s
    [07:44:31 - 07:44:33] createValidValueDefinition|1|1.369 s
    [07:44:33 - 07:44:34] createValidValueDefinition|2|1.540 s
    [07:44:34 - 07:44:36] createValidValueDefinition|3|1.407 s
    [07:44:36 - 07:44:37] createValidValueDefinition|4|1.362 s
    [07:44:37 - 07:44:38] createValidValueDefinition|5|1.352 s
    [07:44:38 - 07:44:40] createValidValueDefinition|6|1.406 s
    [07:44:40 - 07:44:41] createValidValueDefinition|7|1.426 s
    [07:44:41 - 07:44:43] createValidValueDefinition|8|1.414 s
    [07:44:43 - 07:44:44] createValidValueDefinition|9|1.399 s
    [07:44:44 - 07:44:45] createValidValueDefinition|10|1.449 s
    [07:44:45 - 07:44:47] createValidValueDefinition|11|1.256 s
    [07:44:47 - 07:44:48] createValidValueDefinition|12|1.283 s
    [07:44:48 - 07:44:49] createValidValueDefinition|13|1.298 s
    [07:44:49 - 07:44:51] createValidValueDefinition|14|1.316 s
    [07:44:51 - 07:44:52] createValidValueDefinition|15|1.291 s
    [07:44:52 - 07:44:53] createValidValueDefinition|16|1.418 s
    [07:44:53 - 07:44:55] createValidValueDefinition|17|1.359 s
    [07:44:55 - 07:44:56] createValidValueDefinition|18|1.402 s
    [07:44:56 - 07:44:57] createValidValueDefinition|19|1.255 s
    [07:44:57 - 07:44:59] createValidValueDefinition|20|1.260 s
    [07:44:59 - 07:45:00] createValidValueDefinition|21|1.280 s
    [07:45:00 - 07:45:01] createValidValueDefinition|22|1.282 s
    [07:45:01 - 07:45:02] createValidValueDefinition|23|1.288 s
    [07:45:02 - 07:45:04] createValidValueDefinition|24|1.354 s
    [07:45:04 - 07:45:05] createValidValueDefinition|25|1.304 s
    [07:45:05 - 07:45:06] createValidValueDefinition|26|1.284 s
    [07:45:06 - 07:45:08] createValidValueDefinition|27|1.338 s
    [07:45:08 - 07:45:09] createValidValueDefinition|28|1.246 s
    [07:45:09 - 07:45:10] createValidValueDefinition|29|1.347 s
    [07:45:10 - 07:45:11] createValidValueDefinition|30|1.225 s
    [07:45:11 - 07:45:13] createValidValueDefinition|31|1.287 s
    [07:45:13 - 07:45:14] createValidValueDefinition|32|1.304 s
    [07:45:14 - 07:45:15] createValidValueDefinition|33|1.230 s
    [07:45:15 - 07:45:17] createValidValueDefinition|34|1.301 s
    [07:45:17 - 07:45:18] createValidValueDefinition|35|1.478 s
    [07:45:18 - 07:45:19] createValidValueDefinition|36|1.216 s
    [07:45:19 - 07:45:21] createValidValueDefinition|37|1.240 s
    [07:45:21 - 07:45:22] createValidValueDefinition|38|1.318 s
    [07:45:22 - 07:45:23] createValidValueDefinition|39|1.264 s
    [07:45:23 - 07:45:24] createValidValueDefinition|40|1.292 s
    [07:45:24 - 07:45:26] createValidValueDefinition|41|1.212 s
    [07:45:26 - 07:45:27] createValidValueDefinition|42|1.294 s
    [07:45:27 - 07:45:28] createValidValueDefinition|43|1.445 s
    [07:45:28 - 07:45:30] createValidValueDefinition|44|1.212 s
    [07:45:30 - 07:45:31] createValidValueDefinition|45|1.373 s
    [07:45:31 - 07:45:32] createValidValueDefinition|46|1.256 s
    [07:45:32 - 07:45:33] createValidValueDefinition|47|1.304 s
    [07:45:33 - 07:45:35] createValidValueDefinition|48|1.262 s
    [07:45:35 - 07:45:36] createValidValueDefinition|49|1.253 s
    [07:45:36 - 07:45:37] getValidValueSetMembers|0|consistent|568.7 ms
    [07:45:37 - 07:45:37] getValidValueSetMembers|1|consistent|452.9 ms
    [07:45:37 - 07:45:37] getValidValueSetMembers|2|consistent|452.2 ms
    [07:45:37 - 07:45:38] getValidValueSetMembers|3|consistent|529.4 ms
    [07:45:38 - 07:45:38] getValidValueSetMembers|4|consistent|447.2 ms
    [07:45:36 - 07:45:38] getValidValueSetMembers|all|consistent|2.453 s
    [07:45:38 - 07:45:40] getValidValueSetMembers|1|parallel|1.727 s
    [07:45:38 - 07:45:40] getValidValueSetMembers|4|parallel|1.734 s
    [07:45:38 - 07:45:40] getValidValueSetMembers|2|parallel|1.739 s
    [07:45:38 - 07:45:40] getValidValueSetMembers|3|parallel|1.849 s
    [07:45:38 - 07:45:40] getValidValueSetMembers|0|parallel|1.859 s
    [07:45:38 - 07:45:40] getValidValueSetMembers|all|parallel|1.862 s
    [07:47:05 - 07:47:05] createValidValueSet|463.3 ms
    [07:47:05 - 07:47:07] createValidValueDefinition|0|1.158 s
    [07:47:07 - 07:47:08] createValidValueDefinition|1|1.270 s
    [07:47:08 - 07:47:09] createValidValueDefinition|2|1.396 s
    [07:47:09 - 07:47:11] createValidValueDefinition|3|1.303 s
    [07:47:11 - 07:47:12] createValidValueDefinition|4|1.379 s
    [07:47:12 - 07:47:13] createValidValueDefinition|5|1.490 s
    [07:47:13 - 07:47:15] createValidValueDefinition|6|1.451 s
    [07:47:15 - 07:47:16] createValidValueDefinition|7|1.341 s
    [07:47:16 - 07:47:18] createValidValueDefinition|8|1.318 s
    [07:47:18 - 07:47:19] createValidValueDefinition|9|1.242 s
    [07:47:19 - 07:47:20] createValidValueDefinition|10|1.258 s
    [07:47:20 - 07:47:21] createValidValueDefinition|11|1.366 s
    [07:47:21 - 07:47:23] createValidValueDefinition|12|1.465 s
    [07:47:23 - 07:47:24] createValidValueDefinition|13|1.386 s
    [07:47:24 - 07:47:26] createValidValueDefinition|14|1.334 s
    [07:47:26 - 07:47:27] createValidValueDefinition|15|1.350 s
    [07:47:27 - 07:47:28] createValidValueDefinition|16|1.230 s
    [07:47:28 - 07:47:29] createValidValueDefinition|17|1.241 s
    [07:47:29 - 07:47:31] createValidValueDefinition|18|1.299 s
    [07:47:31 - 07:47:32] createValidValueDefinition|19|1.447 s
    [07:47:32 - 07:47:34] createValidValueDefinition|20|1.517 s
    [07:47:34 - 07:47:35] createValidValueDefinition|21|1.280 s
    [07:47:35 - 07:47:36] createValidValueDefinition|22|1.206 s
    [07:47:36 - 07:47:37] createValidValueDefinition|23|1.243 s
    [07:47:37 - 07:47:39] createValidValueDefinition|24|1.184 s
    [07:47:39 - 07:47:40] createValidValueDefinition|25|1.284 s
    [07:47:40 - 07:47:41] createValidValueDefinition|26|1.230 s
    [07:47:41 - 07:47:43] createValidValueDefinition|27|1.494 s
    [07:47:43 - 07:47:44] createValidValueDefinition|28|1.481 s
    [07:47:44 - 07:47:45] createValidValueDefinition|29|1.309 s
    [07:47:45 - 07:47:47] createValidValueDefinition|30|1.209 s
    [07:47:47 - 07:47:48] createValidValueDefinition|31|1.193 s
    [07:47:48 - 07:47:49] createValidValueDefinition|32|1.162 s
    [07:47:49 - 07:47:50] createValidValueDefinition|33|1.159 s
    [07:47:50 - 07:47:52] createValidValueDefinition|34|1.406 s
    [07:47:52 - 07:47:53] createValidValueDefinition|35|1.448 s
    [07:47:53 - 07:47:54] createValidValueDefinition|36|1.451 s
    [07:47:54 - 07:47:56] createValidValueDefinition|37|1.403 s
    [07:47:56 - 07:47:57] createValidValueDefinition|38|1.230 s
    [07:47:57 - 07:47:58] createValidValueDefinition|39|1.273 s
    [07:47:58 - 07:48:00] createValidValueDefinition|40|1.270 s
    [07:48:00 - 07:48:01] createValidValueDefinition|41|1.371 s
    [07:48:01 - 07:48:02] createValidValueDefinition|42|1.317 s
    [07:48:02 - 07:48:04] createValidValueDefinition|43|1.284 s
    [07:48:04 - 07:48:05] createValidValueDefinition|44|1.392 s
    [07:48:05 - 07:48:06] createValidValueDefinition|45|1.428 s
    [07:48:06 - 07:48:08] createValidValueDefinition|46|1.315 s
    [07:48:08 - 07:48:09] createValidValueDefinition|47|1.371 s
    [07:48:09 - 07:48:10] createValidValueDefinition|48|1.265 s
    [07:48:10 - 07:48:12] createValidValueDefinition|49|1.282 s
    [07:48:12 - 07:48:12] getValidValueSetMembers|0|consistent|510.5 ms
    [07:48:12 - 07:48:13] getValidValueSetMembers|1|consistent|419.3 ms
    [07:48:13 - 07:48:13] getValidValueSetMembers|2|consistent|431.0 ms
    [07:48:13 - 07:48:13] getValidValueSetMembers|3|consistent|491.4 ms
    [07:48:13 - 07:48:14] getValidValueSetMembers|4|consistent|383.0 ms
    [07:48:12 - 07:48:14] getValidValueSetMembers|all|consistent|2.242 s
    [07:48:14 - 07:48:15] getValidValueSetMembers|4|parallel|1.404 s
    [07:48:14 - 07:48:15] getValidValueSetMembers|3|parallel|1.412 s
    [07:48:14 - 07:48:15] getValidValueSetMembers|0|parallel|1.421 s
    [07:48:14 - 07:48:15] getValidValueSetMembers|2|parallel|1.509 s
    [07:48:14 - 07:48:15] getValidValueSetMembers|1|parallel|1.511 s
    [07:48:14 - 07:48:15] getValidValueSetMembers|all|parallel|1.516 s

I'm intrigued as to what may be different in your configuration.

planetf1 commented 3 years ago

Here's a result using Crux - all local to my mac, configured using lucene/rocks etc as per the last example at https://odpi.github.io/egeria-connector-crux/getting-started/setup/

    [08:15:57 - 08:15:58] createValidValueSet|642.8 ms
    [08:15:58 - 08:15:59] createValidValueDefinition|0|699.7 ms
    [08:15:59 - 08:15:59] createValidValueDefinition|1|583.0 ms
    [08:15:59 - 08:16:00] createValidValueDefinition|2|599.3 ms
    [08:16:00 - 08:16:00] createValidValueDefinition|3|480.3 ms
    [08:16:00 - 08:16:01] createValidValueDefinition|4|398.4 ms
    [08:16:01 - 08:16:01] createValidValueDefinition|5|432.1 ms
    [08:16:01 - 08:16:02] createValidValueDefinition|6|406.8 ms
    [08:16:02 - 08:16:02] createValidValueDefinition|7|387.6 ms
    [08:16:02 - 08:16:03] createValidValueDefinition|8|651.6 ms
    [08:16:03 - 08:16:03] createValidValueDefinition|9|437.1 ms
    [08:16:03 - 08:16:04] createValidValueDefinition|10|449.1 ms
    [08:16:04 - 08:16:04] createValidValueDefinition|11|389.3 ms
    [08:16:04 - 08:16:05] createValidValueDefinition|12|552.9 ms
    [08:16:05 - 08:16:05] createValidValueDefinition|13|357.0 ms
    [08:16:05 - 08:16:05] createValidValueDefinition|14|421.6 ms
    [08:16:05 - 08:16:06] createValidValueDefinition|15|384.7 ms
    [08:16:06 - 08:16:06] createValidValueDefinition|16|666.5 ms
    [08:16:06 - 08:16:07] createValidValueDefinition|17|559.0 ms
    [08:16:07 - 08:16:08] createValidValueDefinition|18|686.3 ms
    [08:16:08 - 08:16:08] createValidValueDefinition|19|703.3 ms
    [08:16:08 - 08:16:09] createValidValueDefinition|20|436.1 ms
    [08:16:09 - 08:16:09] createValidValueDefinition|21|518.4 ms
    [08:16:09 - 08:16:10] createValidValueDefinition|22|706.6 ms
    [08:16:10 - 08:16:10] createValidValueDefinition|23|506.5 ms
    [08:16:10 - 08:16:11] createValidValueDefinition|24|392.6 ms
    [08:16:11 - 08:16:11] createValidValueDefinition|25|357.3 ms
    [08:16:11 - 08:16:12] createValidValueDefinition|26|407.1 ms
    [08:16:12 - 08:16:12] createValidValueDefinition|27|352.7 ms
    [08:16:12 - 08:16:12] createValidValueDefinition|28|435.4 ms
    [08:16:12 - 08:16:13] createValidValueDefinition|29|330.7 ms
    [08:16:13 - 08:16:13] createValidValueDefinition|30|328.4 ms
    [08:16:13 - 08:16:13] createValidValueDefinition|31|330.4 ms
    [08:16:13 - 08:16:14] createValidValueDefinition|32|298.8 ms
    [08:16:14 - 08:16:14] createValidValueDefinition|33|283.3 ms
    [08:16:14 - 08:16:14] createValidValueDefinition|34|307.1 ms
    [08:16:14 - 08:16:15] createValidValueDefinition|35|288.0 ms
    [08:16:15 - 08:16:15] createValidValueDefinition|36|304.4 ms
    [08:16:15 - 08:16:15] createValidValueDefinition|37|285.0 ms
    [08:16:15 - 08:16:15] createValidValueDefinition|38|295.3 ms
    [08:16:15 - 08:16:16] createValidValueDefinition|39|285.1 ms
    [08:16:16 - 08:16:16] createValidValueDefinition|40|320.1 ms
    [08:16:16 - 08:16:16] createValidValueDefinition|41|256.5 ms
    [08:16:16 - 08:16:17] createValidValueDefinition|42|268.9 ms
    [08:16:17 - 08:16:17] createValidValueDefinition|43|245.1 ms
    [08:16:17 - 08:16:17] createValidValueDefinition|44|274.2 ms
    [08:16:17 - 08:16:17] createValidValueDefinition|45|299.5 ms
    [08:16:17 - 08:16:18] createValidValueDefinition|46|260.7 ms
    [08:16:18 - 08:16:18] createValidValueDefinition|47|290.7 ms
    [08:16:18 - 08:16:18] createValidValueDefinition|48|273.4 ms
    [08:16:18 - 08:16:19] createValidValueDefinition|49|238.8 ms
    [08:16:19 - 08:16:19] getValidValueSetMembers|0|consistent|193.9 ms
    [08:16:19 - 08:16:19] getValidValueSetMembers|1|consistent|113.7 ms
    [08:16:19 - 08:16:19] getValidValueSetMembers|2|consistent|113.7 ms
    [08:16:19 - 08:16:19] getValidValueSetMembers|3|consistent|137.4 ms
    [08:16:19 - 08:16:19] getValidValueSetMembers|4|consistent|127.4 ms
    [08:16:19 - 08:16:19] getValidValueSetMembers|all|consistent|688.6 ms
    [08:16:19 - 08:16:19] getValidValueSetMembers|1|parallel|253.3 ms
    [08:16:19 - 08:16:19] getValidValueSetMembers|2|parallel|261.9 ms
    [08:16:19 - 08:16:19] getValidValueSetMembers|0|parallel|284.2 ms
    [08:16:19 - 08:16:19] getValidValueSetMembers|3|parallel|290.6 ms
    [08:16:19 - 08:16:19] getValidValueSetMembers|4|parallel|291.4 ms
    [08:16:19 - 08:16:19] getValidValueSetMembers|all|parallel|297.2 ms

This performs a lot better than the graph DB (as we expect), but again note that the parallel results are only marginally higher than the series ones. Note that this is still a fairly simple crux config.

Inventoroz commented 3 years ago

Pay attention to concurrent requests. In any test, the execution time for each such request is almost the same. I can say with confidence that if you increase the number of parallel requests, the time of each of them will increase, and again there will not be a strong difference in time between them. This suggests that the threads are blocking each other and each of them somehow waits for the other thread to finish something, so the output turns out that they have almost the same execution time. This is a very strange behavior. Make 50 parallel requests and see what happens.

Arturkad commented 3 years ago

@planetf1 Nigel, you've got much better performance than we do! Thank you for sharing! Looks like we configured something in a bad way. Could you please provide a config file for the server? I think this one could help? /open-metadata/admin-services/users/admin/servers/uad1/instance/configuration

We will try to set up this new Crux and hope to get the same results as you have - so would be easier to find out what was done whong.

And still, even in your last test, the parallel request was done twice longer than individual consistent (serial). As if parallel requests block each other somehow.

planetf1 commented 3 years ago

I posted my simple inmem server config doc to https://gist.github.com/68b337a83739fec29606ea4f4d5c428c (note that all OMASs are enabled)

I did this by using postman to create a very basic (minimal) configuration. Here's my postman log:

https://gist.github.com/1709f869e9ff9f956549b268670bc176

planetf1 commented 3 years ago

@inventoroz @arturkad Here's the inmem test with 1000 requests https://gist.github.com/20471041ed7a8d2ef45fe866dc1313e3

Inventoroz commented 3 years ago

@planetf1 The test variant "inmem" is not suitable, because there, in principle, operations go quickly and therefore the threads do not wait so long for something, run exactly the same test with Janus graph

wbittles commented 3 years ago

It's worth mentioning that as Egeria is a spring boot application, it is configured and build to use an embedded web server component which typically is Apache Tomcat. this means that it's possible to enable the performance logging metrics of Tomcat which wil laid in identifying potential bottlenecks. This component is configurable and for instance using the application.properties file it is possible to tweak the default performance characteristics.

If you do re run the test with 50 simultaneous requests it would be worth checking how many tcp/ip connections witll be supporting this number of requests.

planetf1 commented 3 years ago

I think there's 2 areas to persue here a) Why is the @Arturkad test running a lot slower - is this a difference with Operating System, configuration etc?

b) As @wbittles has pointed out, I think capturing more info in terms of what the different layers see should be valuable. Such as tomcat, and also method calls in the server. There are limited resources especially server threads that would leak to loss of parallelism, for example embedded tomcat threads - though this should default to a generous value of 200.

planetf1 commented 3 years ago

Looking back at the test driving code, a ForkJoinPool will be used by the futures, and by default this sets a parallelism equal to the number of processors - so on my mbp that's probably 4 or 8. This could be a factor affecting the result. As above, traces from what we see on egeria would help.

This can be overriden by specifying the executor when setting up the future or via something like '-Djava.util.concurrent.ForkJoinPool.common.parallelism=100' just for the test

I noticed this as I only saw a few http threads on the server handling the inbound requests ... because we were only having handling a few outstanding at any point

planetf1 commented 3 years ago

One thing you can try is to rerun the test, and on the server set LOGGING_LEVEL_ROOT=debug (you can do this more precisely...)

Then a grep through the log of something like cat /tmp/chassis.log | grep "ValidValuesRESTServices" | grep "call invoked by" | more | cut -c 1-210 can help you understand how the rest calls done internally within the omas are being handled.

With 500 requests we start out with the entry/exit being pretty much serial. By midway through the test there is significant out of order work (as we'd expect) and the final requests towards the end are serial again. This doesn't seem unexpected in any way and I think suggests to me that the challenge is getting sufficient workload in (possibly by ensuring the test driver is sufficient ie in terms of it's parallelism) is the bigger challenge here, rather than there being any issues in the code? Just my current thought......

planetf1 commented 3 years ago

Here's a little extract:

2021-07-29 15:36:44.393 DEBUG 70590 --- [nio-9443-exec-7] o.a.coyote.http11.Http11InputBuffer      : Received [GET /servers/myserver/open-metadata/access-services/digital-architecture/users/myserverUserId/valid-values/sets/9a9c831f-9b66-4e91-93dd-67b4c118c532/members?startFrom=0&pageSize=0 HTTP/1.1
2021-07-29 15:36:44.398 DEBUG 70590 --- [ValueSetMembers] o.o.o.a.d.s.ValidValuesRESTServices      : 59:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId
2021-07-29 15:36:45.037 DEBUG 70590 --- [nio-9443-exec-7] o.o.o.a.d.s.ValidValuesRESTServices      : 59:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId returned with re
2021-07-29 15:36:45.089 DEBUG 70590 --- [nio-9443-exec-4] o.a.coyote.http11.Http11InputBuffer      : Received [GET /servers/myserver/open-metadata/access-services/digital-architecture/users/myserverUserId/valid-values/sets/9a9c831f-9b66-4e91-93dd-67b4c118c532/members?startFrom=0&pageSize=0 HTTP/1.1
2021-07-29 15:36:45.091 DEBUG 70590 --- [ValueSetMembers] o.o.o.a.d.s.ValidValuesRESTServices      : 60:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId
2021-07-29 15:36:45.698 DEBUG 70590 --- [nio-9443-exec-4] o.o.o.a.d.s.ValidValuesRESTServices      : 60:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId returned with re
2021-07-29 15:36:45.724 DEBUG 70590 --- [nio-9443-exec-7] o.a.coyote.http11.Http11InputBuffer      : Received [GET /servers/myserver/open-metadata/access-services/digital-architecture/users/myserverUserId/valid-values/sets/9a9c831f-9b66-4e91-93dd-67b4c118c532/members?startFrom=0&pageSize=0 HTTP/1.1
2021-07-29 15:36:45.728 DEBUG 70590 --- [ValueSetMembers] o.o.o.a.d.s.ValidValuesRESTServices      : 61:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId
2021-07-29 15:36:46.286 DEBUG 70590 --- [nio-9443-exec-7] o.o.o.a.d.s.ValidValuesRESTServices      : 61:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId returned with re
2021-07-29 15:36:46.311 DEBUG 70590 --- [nio-9443-exec-4] o.a.coyote.http11.Http11InputBuffer      : Received [GET /servers/myserver/open-metadata/access-services/digital-architecture/users/myserverUserId/valid-values/sets/9a9c831f-9b66-4e91-93dd-67b4c118c532/members?startFrom=0&pageSize=0 HTTP/1.1
2021-07-29 15:36:46.315 DEBUG 70590 --- [ValueSetMembers] o.o.o.a.d.s.ValidValuesRESTServices      : 62:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId
2021-07-29 15:36:46.933 DEBUG 70590 --- [nio-9443-exec-4] o.o.o.a.d.s.ValidValuesRESTServices      : 62:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId returned with re
2021-07-29 15:36:46.957 DEBUG 70590 --- [nio-9443-exec-7] o.a.coyote.http11.Http11InputBuffer      : Received [GET /servers/myserver/open-metadata/access-services/digital-architecture/users/myserverUserId/valid-values/sets/9a9c831f-9b66-4e91-93dd-67b4c118c532/members?startFrom=0&pageSize=0 HTTP/1.1

(we need to correct the thread id in the log) This is mostly seen throughout. The first log entry is from tomcat processing the rest call, very soon after we process our requests, straight away after that we see the same tomcat thread handle the next request

Again, the only sign of blocking here could be around tomcat or above, including client driver

And to show what happens mid-run:

2021-07-29 15:37:59.572 DEBUG 70590 --- [ValueSetMembers] o.o.o.a.d.s.ValidValuesRESTServices      : 223:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId
2021-07-29 15:38:00.388 DEBUG 70590 --- [nio-9443-exec-6] o.o.o.a.d.s.ValidValuesRESTServices      : 219:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId returned with r
2021-07-29 15:38:00.404 DEBUG 70590 --- [nio-9443-exec-6] o.a.coyote.http11.Http11InputBuffer      : Received [GET /servers/myserver/open-metadata/access-services/digital-architecture/users/myserverUserId/valid-values/sets/9a9c831f-9b66-4e91-93dd-67b4c118c532/members?startFrom=0&pageSize=0 HTTP/1.1
2021-07-29 15:38:00.408 DEBUG 70590 --- [ValueSetMembers] o.o.o.a.d.s.ValidValuesRESTServices      : 224:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId
2021-07-29 15:38:00.856 DEBUG 70590 --- [nio-9443-exec-1] o.o.o.a.d.s.ValidValuesRESTServices      : 217:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId returned with r
2021-07-29 15:38:00.872 DEBUG 70590 --- [nio-9443-exec-1] o.a.coyote.http11.Http11InputBuffer      : Received [GET /servers/myserver/open-metadata/access-services/digital-architecture/users/myserverUserId/valid-values/sets/9a9c831f-9b66-4e91-93dd-67b4c118c532/members?startFrom=0&pageSize=0 HTTP/1.1
2021-07-29 15:38:00.873 DEBUG 70590 --- [nio-9443-exec-2] o.o.o.a.d.s.ValidValuesRESTServices      : 220:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId returned with r
2021-07-29 15:38:00.877 DEBUG 70590 --- [ValueSetMembers] o.o.o.a.d.s.ValidValuesRESTServices      : 225:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId
2021-07-29 15:38:00.887 DEBUG 70590 --- [nio-9443-exec-2] o.a.coyote.http11.Http11InputBuffer      : Received [GET /servers/myserver/open-metadata/access-services/digital-architecture/users/myserverUserId/valid-values/sets/9a9c831f-9b66-4e91-93dd-67b4c118c532/members?startFrom=0&pageSize=0 HTTP/1.1
2021-07-29 15:38:00.890 DEBUG 70590 --- [ValueSetMembers] o.o.o.a.d.s.ValidValuesRESTServices      : 226:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId
2021-07-29 15:38:01.081 DEBUG 70590 --- [nio-9443-exec-7] o.o.o.a.d.s.ValidValuesRESTServices      : 218:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId returned with r
2021-07-29 15:38:01.097 DEBUG 70590 --- [nio-9443-exec-7] o.a.coyote.http11.Http11InputBuffer      : Received [GET /servers/myserver/open-metadata/access-services/digital-architecture/users/myserverUserId/valid-values/sets/9a9c831f-9b66-4e91-93dd-67b4c118c532/members?startFrom=0&pageSize=0 HTTP/1.1
2021-07-29 15:38:01.101 DEBUG 70590 --- [ValueSetMembers] o.o.o.a.d.s.ValidValuesRESTServices      : 227:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId
2021-07-29 15:38:02.064 DEBUG 70590 --- [nio-9443-exec-4] o.o.o.a.d.s.ValidValuesRESTServices      : 222:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId returned with r
2021-07-29 15:38:02.080 DEBUG 70590 --- [nio-9443-exec-4] o.a.coyote.http11.Http11InputBuffer      : Received [GET /servers/myserver/open-metadata/access-services/digital-architecture/users/myserverUserId/valid-values/sets/9a9c831f-9b66-4e91-93dd-67b4c118c532/members?startFrom=0&pageSize=0 HTTP/1.1
2021-07-29 15:38:02.083 DEBUG 70590 --- [ValueSetMembers] o.o.o.a.d.s.ValidValuesRESTServices      : 228:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId
2021-07-29 15:38:02.093 DEBUG 70590 --- [io-9443-exec-10] o.o.o.a.d.s.ValidValuesRESTServices      : 221:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId returned with r
2021-07-29 15:38:02.107 DEBUG 70590 --- [io-9443-exec-10] o.a.coyote.http11.Http11InputBuffer      : Received [GET /servers/myserver/open-metadata/access-services/digital-architecture/users/myserverUserId/valid-values/sets/9a9c831f-9b66-4e91-93dd-67b4c118c532/members?startFrom=0&pageSize=0 HTTP/1.1
2021-07-29 15:38:02.110 DEBUG 70590 --- [ValueSetMembers] o.o.o.a.d.s.ValidValuesRESTServices      : 229:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId
2021-07-29 15:38:02.122 DEBUG 70590 --- [nio-9443-exec-8] o.o.o.a.d.s.ValidValuesRESTServices      : 223:Digital Architecture OMAS:myserver:getValidValueSetMembers call invoked by myserverUserId returned with r
2021-07-29 15:38:02.136 DEBUG 70590 --- [nio-9443-exec-8] o.a.coyote.http11.Http11InputBuffer      : Received [GET /servers/myserver/open-metadata/access-services/digital-architecture/users/myserverUserId/valid-values/sets/9a9c831f-9b66-4e91-93dd-67b4c118c532/members?startFrom=0&pageSize=0 HTTP/1.1

Where overlap is seen.

planetf1 commented 3 years ago

For that test we only ever needed/used 9 tomcat threads - in other tests I've seen higher (ie 10) suggesting this was as many as was needed in this test run.

I think I've completed the investigations I can think of at this time, open to comments.

Arturkad commented 3 years ago

We see that the root of our poor performance, for now, is probably in some configuration issues on our side, because in-memory tests work fine. We are trying to solve them, I will report on the results in case if it could help someone in the future. Nigel, thank you for the lots of information!

planetf1 commented 3 years ago

Glad it helped @Arturkad . I'll assign the issue to you for now so that you can add any conclusions you wish or close as you see fit.

MaximVetlugin commented 3 years ago

@planetf1 , good day

I have also done two runs with the local graph repo:

Can you send me your Egeria server config with "local graph repo" settings? We want to compare it with ours settings (Cassandra & Elastic). Thank you

planetf1 commented 3 years ago

Hi. I don't keep any configs typically, but generate them each time through REST calls - then the config gets scrubbed. This is done automatically via scripts in our FVT tests, and some of our k8s helm charts so that we maintain consistency

For ad-hoc testing (such as this) I use postman -- based on the standard collections checked into GitHub.

I only used the minimum of calls to get a config suitable for testing - so for example the org name etc was not set, and you may well wish to set them. I doubt they would make any difference to performance. So what I had was basically defaults.

For example I posted above my 'in memory' postman log at https://gist.github.com/planetf1/1709f869e9ff9f956549b268670bc176

The only difference would have been changing the call to

POST https://localhost:9443/open-metadata/admin-services/users/me/servers/myserver/local-repository/mode/in-memory-repository

with 'local-graph-repository' instead.

MaximVetlugin commented 3 years ago

@planetf1,

With this option "local-graph-repository" we are using additional settings (configurationProperties), to provide Janus Graph targets for key-value & index.

"localRepositoryLocalConnection": { "class": "Connection", "headerVersion": 0, "connectorType": { "class": "ConnectorType", "headerVersion": 0, "type": { "class": "ElementType", "headerVersion": 0, "elementOrigin": "LOCAL_COHORT", "elementVersion": 0, "elementTypeId": "954421eb-33a6-462d-a8ca-b5709a1bd0d4", "elementTypeName": "ConnectorType", "elementTypeVersion": 1, "elementTypeDescription": "A set of properties describing a type of connector." }, "guid": "9c6b3198-ccef-4644-af13-2789646f4233", "qualifiedName": "OMRS Graph Repository Connector", "displayName": "OMRS Graph Repository Connector", "description": "OMRS Repository Connector that uses graph repository store.", "connectorProviderClassName": "org.odpi.openmetadata.adapters.repositoryservices.graphrepository.repositoryconnector.GraphOMRSRepositoryConnectorProvider" }, "configurationProperties": { "storage.cql.keyspace": "uad_test", "storage.username": "uad_test_user", "storage.backend": "cql", "index.search.hostname": "172.22.100.225", "storage.hostname": "172.22.100.211", "storage.password": "", "index.search.elasticsearch.http.auth.basic.password": "", "index.search.elasticsearch.http.auth.type": "basic", "index.search.elasticsearch.http.auth.basic.username": "elastic", "index.search.backend": "elasticsearch" } }

Do you use something like that?

planetf1 commented 3 years ago

We've only tested JanusGraph using a single berkeleydb - locally

We have discussed how to document/test/support a more flexible/full JanusGraph deployment, but we would need to make changes to how transactions are managed (both when using a transactional back-end, and when using an eventually complete backend like Cassandra). We're certain there are significant problems currently.

At this point in time the most scalable repository we have is the crux support.

That being said if you need a more complete Janus setup then we can share what we know so far...

planetf1 commented 3 years ago

Were you able to run tests with the local repo? Did you get performance similar to me? How about with the graph repo with a local store? As above, we haven't tested remote JanusGraph, and think there are issues with transactions. Of course if you want to work through that it would be helpful, if not I just wanted to make sure you're aware.

planetf1 commented 3 years ago

@Arturkad did you gather any further info on this issue? When I tried the base performance looked ok. I'd be very keen to help you are any issues you do amaze, plus as mentioned we do know there are limitations in the current graph repo which is really only suitable for BerkeleyDB.

planetf1 commented 3 years ago

I'm going to close this as we don't have anything actionable at this point. If you do gather more info please do reopen/open another issue.