oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
251 stars 39 forks source link

slow test suite and schema operations on Mac #53

Open davepacheco opened 3 years ago

davepacheco commented 3 years ago

On my 2020 Macbook Pro running Catalina (10.15.7), using the official CockroachDB v20.2.5 build, CockroachDB schema updates are incredibly slow. I've started omicron_dev db-run in the background (which itself took tens of seconds to populate the database with an empty schema). Then, running either db-wipe or db-populate takes many seconds each (except wiping an empty database):

$ time cargo run --bin=omicron_dev db-wipe --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.12s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m7.205s
user    0m0.199s
sys 0m0.067s
$ time cargo run --bin=omicron_dev db-wipe --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.12s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m0.361s
user    0m0.203s
sys 0m0.067s
$ time cargo run --bin=omicron_dev db-wipe --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.12s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m0.300s
user    0m0.201s
sys 0m0.066s
$ time cargo run --bin=omicron_dev db-populate --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
     Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database

real    0m19.214s
user    0m0.201s
sys 0m0.068s
$ time cargo run --bin=omicron_dev db-wipe --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.23s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m6.726s
user    0m0.207s
sys 0m0.085s
$ time cargo run --bin=omicron_dev db-populate --database-url=postgresql://root@127.0.0.1:64502?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
     Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:64502?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database

real    0m18.989s
user    0m0.196s
sys 0m0.068s
$

cargo test does a bunch of similar operations, so it winds up taking 2m40s altogether.

By contrast, on illumos (omnios-r151034-0d278a0cc5) running in AWS (t2.2xlarge, 8 cores, 32 GiB DRAM, one 200 GiB EBS volume), this is almost 20x faster:

$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m0.305s
user    0m0.118s
sys     0m0.080s
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m0.198s
user    0m0.116s
sys     0m0.077s
$ time cargo run --bin=omicron_dev -- db-populate --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database

real    0m0.585s
user    0m0.117s
sys     0m0.078s
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m0.301s
user    0m0.117s
sys     0m0.079s
$ time cargo run --bin=omicron_dev -- db-populate --database-url=postgresql://root@127.0.0.1:54148?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:54148?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database

real    0m0.592s
user    0m0.119s
sys     0m0.080s

and the test suite takes only 8s.

Similarly, on illumos (helios-1.0.20336) running under VMware Fusion 11.5.7 on the same Mac, these operations are fast:

$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m0.739s
user    0m0.208s
sys 0m0.384s
$ time cargo run --bin=omicron_dev -- db-populate --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
     Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database

real    0m1.011s
user    0m0.200s
sys 0m0.374s
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m0.663s
user    0m0.193s
sys 0m0.368s
$ time cargo run --bin=omicron_dev -- db-populate --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/omicron_dev db-populate '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: populating database
omicron_dev: populated database

real    0m1.020s
user    0m0.194s
sys 0m0.368s
$ time cargo run --bin=omicron_dev -- db-wipe --database-url=postgresql://root@127.0.0.1:51212?sslmode=disable
    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/omicron_dev db-wipe '--database-url=postgresql://root@127.0.0.1:51212?sslmode=disable'`
omicron_dev: wiping any existing database
omicron_dev: wiped

real    0m0.672s
user    0m0.197s
sys 0m0.367s

and cargo test takes about 17s on this system.

Summarizing:

Hardware Virtualization OS, FS CockroachDB Time to populate Test suite time
2020 Macbook (8 HW threads, 32 GiB DRAM, local NVME SSD) None MacOS Catalina 10.15.7, APFS v20.2.5 (official build) 18s 2m40s
2020 Macbook (same as above) VMWare Fusion 11.5.7 (VM has 4 HW threads, 8 GiB memory) illumos (helios-1.0.20336) v20.2.5 (illumos OSS build (unofficial)) 1s 17s
AWS EC2 t2.2xlarge (8 HW threads, 32 GiB DRAM, 200 GiB EBS (network) volume) AWS EC2 illumos (omnios-r151034-0d278a0cc5) v20.2.5 (illumos OSS build (unofficial)) <1s 8s

This is all on commit a0ce0e46e2cd117698d05a40e418de0890460fb4.

This looks like a CockroachDB issue at this point. I can reproduce the slowness using this statement from the schema file:

CREATE UNIQUE INDEX ON omicron.public.Project (
    name
) WHERE
    time_deleted IS NOT NULL;

This takes several hundred milliseconds on an empty table.

Here are some avenues to explore:

All that said: I'm not planning to dig into this right now. I just wanted to record this in case we decide to dig deeper in the future.

david-crespo commented 3 years ago

Was just about to make an issue for this. Glad all this detail already exists. I'm on an M1 MBP seeing the same thing. It's not unusable, but when I run a test locally that needs Cockroach, it takes about 20 seconds. When I run two such tests, it takes 40 seconds, so it's not shared overhead.

david-crespo commented 1 year ago

Haven't tested it too scientifically, but I think I'm seeing a huge improvement with the new M1 Mac build in 22.2.

cargo run --bin=omicron-dev db-run  
~/oxide/omicron $ time cargo run --bin=omicron-dev db-wipe --database-url="postgresql://root@127.0.0.1:32221?sslmode=disable"    
warning: skipping duplicate package `header-check` found at `/Users/david/.cargo/git/checkouts/propolis-12517f89d3d9f483/b596e72/crates/viona-api/header-check`
    Finished dev [unoptimized + debuginfo] target(s) in 0.31s
     Running `target/debug/omicron-dev db-wipe '--database-url=postgresql://root@127.0.0.1:32221?sslmode=disable'`
omicron-dev: wiping any existing database
omicron-dev: wiped
cargo run --bin=omicron-dev db-wipe   0.25s user 0.12s system 41% cpu 0.918 total

~/oxide/omicron $ time cargo run --bin=omicron-dev db-populate --database-url="postgresql://root@127.0.0.1:32221?sslmode=disable"
warning: skipping duplicate package `header-check` found at `/Users/david/.cargo/git/checkouts/propolis-12517f89d3d9f483/b596e72/crates/viona-api/header-check`
    Finished dev [unoptimized + debuginfo] target(s) in 0.32s
     Running `target/debug/omicron-dev db-populate '--database-url=postgresql://root@127.0.0.1:32221?sslmode=disable'`
omicron-dev: populating database
omicron-dev: populated database
cargo run --bin=omicron-dev db-populate   0.25s user 0.13s system 20% cpu 1.860 total