yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.88k stars 1.05k forks source link

[Docs] Data Loading Page #11768

Open dmagda opened 2 years ago

dmagda commented 2 years ago

YugabyteDB data loading experience is completely different from the experience you have with single-server relational databases. For instance, application developers who use Postgres, MySQL or SQL Server still load data with multiple INSERTs which is considered an anti-pattern in YugabyteDB. Developers do this with Postgres and MySQL simply because the loading time is fast and many sample datasets that you download from the Internet use multiple INSERTs instead of COPY. Such developers will be puzzled to see that the loading time with multiple INSERTS takes 2x, 3x, 10x longer with YugabyteDB.

We need to educate and guide developers on how to approach the initial data loading in YugabyteDB. This is the step where the developer experience is poor.

Data Loading page

Create a separate Data Loading page. Place it right after or near the Installation and Getting Started sections. The logic here is that once the developers deploys YugabyteDB and succeeds with a getting-started-level app/tutorial she will move forward with some sample data that represents a subset of the data of a company's app or a data set downloaded from the Internet. Thus, the Data Loading page has to be visible in the navigation tree.

Data Loading page content

The content needs to explain various data loading optimization techniques and explain why some approaches don't work as expected (for instance the multiple INSERTs case).

Some techniques to mention:

  1. Loading with INSERTS

It's important to explain why each step is necessary in YugabyteDB, how we're different and why this is OK. Remember that the performance is poor even with a single-node YugbyteDB cluster running locally if you compare to a local Postgres instance.

  1. Loading with the COPY command

Suggest this as an alternate (preffered?) option to the #1 above. Provide steps and instructions. You can use Franck's blog for reference: https://dev.to/yugabyte/copy-progression-in-yugabytedb-4ghb

YugabyteDB on-prem vs. Yugabyte Cloud - fighting networking latency

We can deal with a developer who uses Yugabyte DB (or Platform) in her own environment or a developer who has started with Yugabyte Cloud.

For the on-prem scenarious, suggest loading the data from a location close to the YugabyteDB deployment.

For Yugabyte Cloud, well, it's more complicated because the developer needs to create an instance in a cloud region, where the database is running, and from that instance do the loading. For now, at least, we can provide some basic steps.

rthallamko3 commented 10 months ago

Adding @ymahajan , as we would like to steer folks towards using Voyeger.