Closed philiure closed 2 years ago
How are you pushing the dataset? Memory is not necessarily used by Sonic, your ingest program can also use memory. I didn't have any problems when pushing 25M documents (~2 GB) using node-sonic-channel.
Found the problem! It was the logs from the server in the terminal that crashed it. Running the server with sonic > /dev/null
fixed it and improved my push speed by 50%!
The push seemed to have slowed down significantly, at the start it was 2.5 ms/doc, and now it seems 50:ms/doc how much time did it take you to push 25M documents @Sly-Little-Fox ?
The push seemed to have slowed down significantly, at the start it was 2.5 ms/doc, and now it seems 50:ms/doc how much time did it take you to push 25M documents @Sly-Little-Fox ?
~4 hours with default config (with loglevel "error" though, "debug" can actually slow things down). I used tmpfs for storage and one core (Ampere A1, Oracle Cloud). I actually observed that using a lot of threads for ingesting sometimes makes Sonic slow down significantly even though it doesn't use even a half of my threads (8 threads, 4 cores). I don't know why it gets stuck (it's not I/O, because tmpfs).
Note that write operations are not lock-free, so if something is I/O bound, eg. the SSD I/O, the CPU core responsible for the RocksDB threads (try increasing parallelism
?), then things will start slowing down as Sonic Channel and other threads rely on this main DB threads during write operations.
Alright, I'll see if I can improve it. Thanks a lot!
I am pushing a dataset of 12M documents to Sonic, but the terminal crashes due to memory issues at 2% of the push. I am running a Rust Server in release mode, and wonder why the terminal keeps crashing. Any advice on solving these memory issues is welcome. The size of the dictionary object containing the documents is 0.67 GB.
Activity Monitor indicates 70+GB of memory use for terminal upon crashing.