After evaluating the POST performance of the Web Annotation Protocol server the performance issues presented in the final report could be reproduced. Furthermore, the reason for the relationship between performance decrease and container size could be identified and is cause by the following Jena code:
The size() method is used in org.apache.jena.rdf.model.impl.SeqImpl to determine the current size of the container before adding a new element. Iterating through all elements will cause a steadily slowdown of POST operations as shown in the following table:
Number of Elements in a Container
Time to add one new Element [ms]
Time to add next 10K Elements [hh:mm:ss] (approx.)
10.000
40
00:06:40
20.000
56
00:09:20
30.000
69
00:11:30
40.000
83
00:13:50
50.000
107
00:17:50
60.000
117
00:19:30
...
...
...
140.000
253
00:39:10
...
...
...
500.000
768 (est.)
02:09:40
Of course, these values will depend on the local hardware, but one should at least estimate an increase of approx. 15 ms for posting one annotation every 10.000 elements.
Currently, there seem to be two solutions:
1) Being aware of the described behaviour and prefer using small containers or containers of containers.
2) Change the implementation of the Jena repository to store sequence information elsewhere, e.g. in a relational database.
After evaluating the POST performance of the Web Annotation Protocol server the performance issues presented in the final report could be reproduced. Furthermore, the reason for the relationship between performance decrease and container size could be identified and is cause by the following Jena code:
https://github.com/apache/jena/blob/a7ba51f67e7af819178fea9a06a6dad0415877c3/jena-core/src/main/java/org/apache/jena/rdf/model/impl/ContainerImpl.java#L181
The size() method is used in org.apache.jena.rdf.model.impl.SeqImpl to determine the current size of the container before adding a new element. Iterating through all elements will cause a steadily slowdown of POST operations as shown in the following table:
Of course, these values will depend on the local hardware, but one should at least estimate an increase of approx. 15 ms for posting one annotation every 10.000 elements.
Currently, there seem to be two solutions:
1) Being aware of the described behaviour and prefer using small containers or containers of containers. 2) Change the implementation of the Jena repository to store sequence information elsewhere, e.g. in a relational database.