saleor / saleor

Saleor Core: the high performance, composable, headless commerce API.
https://saleor.io
BSD 3-Clause "New" or "Revised" License
20.73k stars 5.49k forks source link

Add manage.py commands to create and restore instance snapshot #11160

Open cmiacz opened 1 year ago

cmiacz commented 1 year ago

What I'm trying to achieve

As I user I should be able to create a snapshot of Saleor database & media files which can be restored on other Saleor instance.

Describe a proposed solution

Extend Django's loaddata and dumpdata commands. Dump each Saleor app seprately to limit size of single json dump and limit memory usage. Output as a zip archive including folder with media files copied from the instance storage. Skip thumbnails. Add optional metadata.json file including Saleor version and other data which may be useful during restore.

Example snapshot structure

snapshot.zip
|__ metadata.json
|__ media/
     |__ products/
     ...
|__ data/
     |__ account.json
     |__ channel.json
     ...
cmiacz commented 1 year ago

Please not that Saleor uses SQL sequence here: https://github.com/saleor/saleor/blob/d90be220d6b687d08153934a51354011a3cb5ca1/saleor/order/models.py#L93

It is not updated by Django loaddata command so It will have to be updated manually

cmiacz commented 1 year ago

As we are about to drop thumbnails, please make sure to skip thumbnail model when dumping the data

cmiacz commented 1 year ago

Live example. I've tried to restore 1,4GB json dump with 932318 objects. objects.txt

Django loaddata command was unable to handle it due to memory consumption. I've splitted dump into smaller parts, each model separately with up to 10k objects in one batch. With this approach loaddata passed however I had to load dumps with certain order to keep pass DB integrity checks. I had to apply ordres before applying paymets, because payment model has order's foreign key. There are also circular dependencies (e.g product, variants and default variant) so in this case two model dumps had to be merged. In final solution it'd be good to disable DB itegrity checks during partial restores and run it once at the end.

The other issue is that Django loaddata is still slow. In final solution we may consider impelemnting custom logic with JSON stream processing and bulk creates instead.

https://pypi.org/project/json-stream/

cmiacz commented 1 year ago

Another thing worth mentioning is that out of 932318 objects in live example ~83% (782263) belonged to core app (event payloads, deliveries and delivery attemtps). @maarcingebala Is it necessary to keep these object while moving Saleor DB between different instances?