mathesar-foundation / mathesar

Web application providing an intuitive user experience to databases.
https://mathesar.org/
GNU General Public License v3.0
2.32k stars 320 forks source link

Efficient data loader #3448

Closed Anish9901 closed 6 months ago

Anish9901 commented 6 months ago

Fixes #3423

This PR reduces the time required to load the "Movie Collection" dataset for local as well as remote DBs.

Technical details

Problem: We previously relied on a single SQL dump of Movie Collection schema to load the entire dataset into the DB. This was fine when the DB was present along with the django service but was really slow in the case of a remote DB which ultimately resulted in server timeout with a 502 response.

Solution: This PR solves the aforementioned problem by breaking the large SQL dump into parts and extracting all the data to be loaded in multiple .csv.

We first execute movie_collection_tables.sql then we load all the data in the tables using SQL COPY instead of INSERT and then finally we execute movie_collection_fks.sql to setup PKs and FKs.

Performance (GCP): DB Before After Improvement
Local 25s 9s 64%
Remote 186s ~(3.1mins) 12s 93.5%

Checklist

- [x] My pull request has a descriptive title (not a vague title like `Update index.md`). - [x] My pull request targets the `develop` branch of the repository - [x] My commit messages follow [best practices][best_practices]. - [x] My code follows the established code style of the repository. - [ ] I added tests for the changes I made (if applicable). - [ ] I added or updated documentation (if applicable). - [x] I tried running the project locally and verified that there are no visible errors. [best_practices]:https://gist.github.com/robertpainsi/b632364184e70900af4ab688decf6f53 ## Developer Certificate of Origin
Developer Certificate of Origin ``` Developer Certificate of Origin Version 1.1 Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 1 Letterman Drive Suite D4700 San Francisco, CA, 94129 Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. ```