Master issue to track improvements to make it easier and faster to get large amounts of data into YugabyteDB.
Phase 1
Status
Feature
GitHub Issue
Comments
✅
Faster non transactional writes during bulk load
#7809
Allowing faster writes on copy command by using session variable "yb_force_non_transactional_writes".
✅
Disable transactional writes during bulk data loading for indexes
#11266
Add yb_disable_transactional_writes session to improve the latency performance of bulk data loading for index tables such as when COPY command is used which goes into the insert write path (not delete or update).
✅
Implement Async Flush for COPY command
#11628
Currently, we synchronously wait for a flush response every time we flush. We want to make this asynchronous to reduce the time spent waiting and improve the performance of COPY.
✅
Speed up YSQL inserts by skipping lookup of keys being inserted
#11269
During bulk load (for example inserts by Copy command), skip lookup of the key being inserted, to speed up the inserts. This is similar to the upsert mode that is supported for YCQL.
✅
Optimize memory allocation/deallocation in bulk insert/copy using Protobuf's arena
#11720
Currently when running bulk insert / copy command, in the PostgreSQL backend for, about 15 percent of CPU time is spent on memory allocation / deallocation.
✅
Perf improvement by eliminating serialization to the WAL format.
#11409
When writing data to the RocksDb layer, there are additional steps of serializing to the WAL format which is unnecessary and leads to wasted work.
✅
Tuning parameters for faster copy performance
#12293
Tuning parameters for faster copy performance
✅
Pack columns in DocDB storage format for better performance
#3520
Packing columns into a single RocksDB entry per row instead of one per column (as we do currently) improves YSQL performance
⬜️
Parallelize copy command
#11453
Distribute copy operation internally using multiple workers
Phase 2
Status
Feature
GitHub Issue
Comments
⬜️
Streaming ingest to YugabyteDB without using JDBC
Inserting around 1 billion records through the streaming interface every day. It will be inefficient to transfer this huge volume of records over the JDBC interface. It could be implementing Spark RDD write interface.
Jira Link: DB-4641
Description
Master issue to track improvements to make it easier and faster to get large amounts of data into YugabyteDB.
Phase 1
Phase 2