To upsert like 1000 documents in ab batch i.e. Products from restorecommerce/data takes a significant amount of time.
This is because the ResourceAPI is checking the entire database for each document's counterpart. Of course, Databases are using efficient search algorithms, but not much better than O(log2(n)). Since ArangoDB does not and cannot know that id is unique but treat it like a common field, it will scan the entire collection each time and always end up with the worst case of O(n*log2(m)). Where n is the number of inputs and m the number of documents in DB.
Since UPSERT is checking for counterparts and calls UPDATE which is doing the same again, we double up the workload to O(2n*log(m)).
With n=1000 and m=1000 we end up with a complexity of 2*1000*log(1000)=~20000. So x20 per document.
How to Solve it
Set a limit: 2 in both queries, such that DB can terminate the search early and does not run into worst case every time.
We use limit: 2 such that we can ensue that there is no further document with the same ID which is considered to be an invalid state and should throw an error.
Further Improvement
Try to batch the query but do not process each document individually in a for-loop.
An example how to batch a query of many IDs see here (also using limit):
UPSERT has bad Performance
UPSERT-Operation have bad performance due to unrequited workload caused by this line:
https://github.com/restorecommerce/resource-base-interface/blob/d4085e679080ac1e259c04509e38cc1110e1e851/src/core/ResourcesAPI.ts#L405
What is Happening
To upsert like 1000 documents in ab batch i.e. Products from restorecommerce/data takes a significant amount of time. This is because the ResourceAPI is checking the entire database for each document's counterpart. Of course, Databases are using efficient search algorithms, but not much better than O(log2(n)). Since ArangoDB does not and cannot know that
id
is unique but treat it like a common field, it will scan the entire collection each time and always end up with the worst case ofO(n*log2(m))
. Wheren
is the number of inputs andm
the number of documents in DB. Since UPSERT is checking for counterparts and calls UPDATE which is doing the same again, we double up the workload toO(2n*log(m))
. Withn=1000
andm=1000
we end up with a complexity of2*1000*log(1000)=~20000
. So x20 per document.How to Solve it
Set a
limit: 2
in both queries, such that DB can terminate the search early and does not run into worst case every time. We uselimit: 2
such that we can ensue that there is no further document with the same ID which is considered to be an invalid state and should throw an error.Further Improvement
Try to batch the query but do not process each document individually in a for-loop. An example how to batch a query of many IDs see here (also using limit):
https://github.com/restorecommerce/ordering-srv/blob/217211bf1d8d21fe0198e44a5a71572da5538017/src/service.ts#L900