kit-data-manager / wap-server

Apache License 2.0
4 stars 3 forks source link

Performance issues with POST to large containers #1

Open ThomasJejkal opened 3 years ago

ThomasJejkal commented 3 years ago

After evaluating the POST performance of the Web Annotation Protocol server the performance issues presented in the final report could be reproduced. Furthermore, the reason for the relationship between performance decrease and container size could be identified and is cause by the following Jena code:

https://github.com/apache/jena/blob/a7ba51f67e7af819178fea9a06a6dad0415877c3/jena-core/src/main/java/org/apache/jena/rdf/model/impl/ContainerImpl.java#L181

The size() method is used in org.​apache.​jena.​rdf.​model.​impl.SeqImpl to determine the current size of the container before adding a new element. Iterating through all elements will cause a steadily slowdown of POST operations as shown in the following table:

Number of Elements in a Container Time to add one new Element [ms] Time to add next 10K Elements [hh:mm:ss] (approx.)
10.000 40 00:06:40
20.000 56 00:09:20
30.000 69 00:11:30
40.000 83 00:13:50
50.000 107 00:17:50
60.000 117 00:19:30
... ... ...
140.000 253 00:39:10
... ... ...
500.000 768 (est.) 02:09:40

Of course, these values will depend on the local hardware, but one should at least estimate an increase of approx. 15 ms for posting one annotation every 10.000 elements.

Currently, there seem to be two solutions:

1) Being aware of the described behaviour and prefer using small containers or containers of containers. 2) Change the implementation of the Jena repository to store sequence information elsewhere, e.g. in a relational database.