restorecommerce / resource-base-interface

Basic Resource Interface defining CRUD Operations.
MIT License
0 stars 0 forks source link

UPSERT has bad Performance #225

Open bugerry87 opened 3 months ago

bugerry87 commented 3 months ago

UPSERT has bad Performance

UPSERT-Operation have bad performance due to unrequited workload caused by this line:

https://github.com/restorecommerce/resource-base-interface/blob/d4085e679080ac1e259c04509e38cc1110e1e851/src/core/ResourcesAPI.ts#L405

What is Happening

To upsert like 1000 documents in ab batch i.e. Products from restorecommerce/data takes a significant amount of time. This is because the ResourceAPI is checking the entire database for each document's counterpart. Of course, Databases are using efficient search algorithms, but not much better than O(log2(n)). Since ArangoDB does not and cannot know that id is unique but treat it like a common field, it will scan the entire collection each time and always end up with the worst case of O(n*log2(m)). Where n is the number of inputs and m the number of documents in DB. Since UPSERT is checking for counterparts and calls UPDATE which is doing the same again, we double up the workload to O(2n*log(m)). With n=1000 and m=1000 we end up with a complexity of 2*1000*log(1000)=~20000. So x20 per document.

How to Solve it

Set a limit: 2 in both queries, such that DB can terminate the search early and does not run into worst case every time. We use limit: 2 such that we can ensue that there is no further document with the same ID which is considered to be an invalid state and should throw an error.

Further Improvement

Try to batch the query but do not process each document individually in a for-loop. An example how to batch a query of many IDs see here (also using limit):

https://github.com/restorecommerce/ordering-srv/blob/217211bf1d8d21fe0198e44a5a71572da5538017/src/service.ts#L900

bugerry87 commented 3 months ago

PS: another way is to query for doc._key which is hashed and known as unique and therewith at best performance that you can get from ArangoDB