[Docs] Data Loading Page

YugabyteDB data loading experience is completely different from the experience you have with single-server relational databases. For instance, application developers who use Postgres, MySQL or SQL Server still load data with multiple INSERTs which is considered an anti-pattern in YugabyteDB. Developers do this with Postgres and MySQL simply because the loading time is fast and many sample datasets that you download from the Internet use multiple INSERTs instead of COPY. Such developers will be puzzled to see that the loading time with multiple INSERTS takes 2x, 3x, 10x longer with YugabyteDB.

We need to educate and guide developers on how to approach the initial data loading in YugabyteDB. This is the step where the developer experience is poor.

Data Loading page

Create a separate Data Loading page. Place it right after or near the Installation and Getting Started sections. The logic here is that once the developers deploys YugabyteDB and succeeds with a getting-started-level app/tutorial she will move forward with some sample data that represents a subset of the data of a company's app or a data set downloaded from the Internet. Thus, the Data Loading page has to be visible in the navigation tree.

Data Loading page content

The content needs to explain various data loading optimization techniques and explain why some approaches don't work as expected (for instance the multiple INSERTs case).

Some techniques to mention:

Loading with INSERTS
- Replace multiple INSERTs to a single or a few INSERT statements
- Execute the INSERT statement(s) in a single transaction - use BEGIN insert COMMIT structure
- Disable triggers (but don't forget to turn them back) - ALTER TABLE <table_name> disable TRIGGER ALL
- etc. more techniques are discused here: https://docs.google.com/document/d/1jCLiHDEHiYpgVObILDC_2Ormr-Kx36YhkqHXUCVGO1Q/edit#heading=h.28rbkno9vzy9

It's important to explain why each step is necessary in YugabyteDB, how we're different and why this is OK. Remember that the performance is poor even with a single-node YugbyteDB cluster running locally if you compare to a local Postgres instance.

Loading with the COPY command

Suggest this as an alternate (preffered?) option to the #1 above. Provide steps and instructions. You can use Franck's blog for reference: https://dev.to/yugabyte/copy-progression-in-yugabytedb-4ghb

YugabyteDB on-prem vs. Yugabyte Cloud - fighting networking latency

We can deal with a developer who uses Yugabyte DB (or Platform) in her own environment or a developer who has started with Yugabyte Cloud.

For the on-prem scenarious, suggest loading the data from a location close to the YugabyteDB deployment.

For Yugabyte Cloud, well, it's more complicated because the developer needs to create an instance in a cloud region, where the database is running, and from that instance do the loading. For now, at least, we can provide some basic steps.

yugabyte / yugabyte-db