Closed Radivarig closed 4 years ago
Same for LargeZipfUTest.
Both tests log this before as well: [INFO] Bulk store to database with max UUID=1
OK, So first, these two tests do run significantly longer than any others (maybe I should scale them back). On my system, the relative timings are:
Start 49: BasicSaveUTest
49/135 Test #49: BasicSaveUTest ................... Passed 3.04 sec
Start 50: ValueSaveUTest
50/135 Test #50: ValueSaveUTest ................... Passed 4.59 sec
Start 51: PersistUTest
51/135 Test #51: PersistUTest ..................... Passed 1.48 sec
Start 52: DeleteUTest
52/135 Test #52: DeleteUTest ...................... Passed 3.40 sec
Start 53: FetchUTest
53/135 Test #53: FetchUTest ....................... Passed 2.49 sec
Start 54: MultiPersistUTest
54/135 Test #54: MultiPersistUTest ................ Passed 1.69 sec
Start 55: MultiUserUTest
55/135 Test #55: MultiUserUTest ................... Passed 4.79 sec
Start 56: LargeFlatUTest
56/135 Test #56: LargeFlatUTest ................... Passed 225.69 sec
Start 57: LargeZipfUTest
57/135 Test #57: LargeZipfUTest ................... Passed 285.84 sec
Note that they take 50x 100x longer than the other tests.
While these are running, postgres devotes 1 to 6 threads running from 70% cpu to 30% cpu (each) the unit test itself is varying from 1% to 60% cpu, but usually low. So this is very bottlenecked on postgres. (I added these tests not because I thnk postgres is in danger; rather, I was working on a distributed hash table version, which is flakey as hell, and can't deal with this kind of data. It occurred to be safety minded and backport to the atomspace.).
Getting back to postgres: its actual performance is highly susceptible to tuning. There's a number of parameters one can play with and opencog/persist/sql/README.md
reviews key ones. Also, SSD disks can often make a huge difference (I don't know if/how they'd make a difference for these tests)
I did not look to see if my log filled with these messages, but they are not a surprise: the unit test must wait for postgres to finish. So perhaps this is "not a bug" .... maybe you just have to be patient ...
Also you can manually reduce the size of these tests: you can edit them, grep for NCOPIES
and make it smaller. (Its NWORDS
and NPAIRS
in the other) I don't want to change the defaults, at least not yet, unless everyone complains. CircleCI seems to deal with it. And my box is 5+ years old, so these tests are faster for most people ....
I should not have assumed it was indefinitely, altho my patience was 300 seconds.
These are the execution times:
...
Start 48: AtomSpaceUtilsUTest
48/127 Test #48: AtomSpaceUtilsUTest .............. Passed 0.03 sec
Start 49: BasicSaveUTest
49/127 Test #49: BasicSaveUTest ................... Passed 1.89 sec
Start 50: ValueSaveUTest
50/127 Test #50: ValueSaveUTest ................... Passed 2.42 sec
Start 51: PersistUTest
51/127 Test #51: PersistUTest ..................... Passed 0.78 sec
Start 52: DeleteUTest
52/127 Test #52: DeleteUTest ...................... Passed 1.81 sec
Start 53: FetchUTest
53/127 Test #53: FetchUTest ....................... Passed 1.04 sec
Start 54: MultiPersistUTest
54/127 Test #54: MultiPersistUTest ................ Passed 1.22 sec
Start 55: MultiUserUTest
55/127 Test #55: MultiUserUTest ................... Passed 3.33 sec
Start 56: LargeFlatUTest
56/127 Test #56: LargeFlatUTest ................... Passed 1127.55 sec
Start 57: LargeZipfUTest
57/127 Test #57: LargeZipfUTest ................... Passed 1385.08 sec
Start 58: BasicSCMUTest
58/127 Test #58: BasicSCMUTest .................... Passed 2.65 sec
...
I'm running on these specs:
λ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i7-8705G CPU @ 3.10GHz
Stepping: 9
CPU MHz: 898.058
CPU max MHz: 4100.0000
CPU min MHz: 800.0000
BogoMIPS: 6192.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
Parallel checking is disabled as mentioned in this issue #34.
well, that is on the long side. Again, postgres is not usually CPU-bound; having a fast CPU does little -- it's I/O bound, writing to disk. (and other I/O bottlenecks: cache-to-RAM, user-kernel, RAM--to-DMA (PCIe), PCIe-to-SATA) The circleci unit tests run in 164 seconds; they have a CPU comparable or slower than yours.
Cut-n-paste from the tuning README:
The Postgres default configuration can be (and should be) tweaked for performance. The performance will be disastrously slow if the database is not tuned. The primary performance bottleneck is the default of synchronous commits during writing. On spinning disk drives, this can lead to delays of tens of milliseconds to write handfuls of atoms, as that is the seek time (latency) for spinning disk media. Thus, synchronous commits should be disabled.
Solid state disks are a lot faster; it's not clear if this would still be a bottleneck.
Disabling synchronous commits may cause the latest database updates to be lost, if power goes out, or the system unexpectedly reboots. This kind of loss is usually not a problem for most opencog apps, ... all of the current apps are gathering statistics from generated data, and so this kind of loss is almost surely inconsequential.
Edit postgresql.conf
(a typical location is
/etc/postgresql/9.5/main/postgresql.conf
) and make the changes below.
The first two changes are recommended by
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
shared_buffers = default was 32MB, change to 25% of installed RAM
work_mem = default was 1MB change to 32MB
effective_cache_size = default was 128MB, change to 50%-75% of installed RAM
synchronous_commit = default on change to off
wal_buffers = default 64kB change to 2MB or even 32MB
checkpoint_segments = 64 (each one takes up 16MB disk space)
max_connections = 130 (each opencog instance needs 32)
max_worker_processes = 32 (one per CPU core)
ssl = off
For SSD drives, the following should help:
seq_page_cost = 0.1
random_page_cost = 0.15
effective_io_concurrency = 20
For spinning media, synchronous_commit=off
is 120x faster than
synchronous_commit=local
(about 400 atoms/sec vs 3 atoms/sec)
For write-mostly databases, such as in the language-learning project,
you will get better results with checkpoint_segments = 100
.
If you have postgres 9.0 or newer, there are no checkpoint_segments. Instead, do this:
checkpoint_timeout = 1h
max_wal_size = 8GB
checkpoint_completion_target = 0.9
This will avoid the "checkpoints are occurring too frequently" warning message.
Restarting the server might lead to errors stating that max shared mem usage has been exceeded. This can be fixed by telling the kernel to use 6.4 gigabytes (for example):
vi /etc/sysctl.conf
kernel.shmmax = 6440100100
save file contents, then:
sysctl -p /etc/sysctl.conf
All new, all improved README file:
The Postgres default configuration can be (and should be) tweaked for performance. The performance will be disastrously slow if the database is not tuned. The primary performance bottleneck is the default of synchronous commits during writing. On spinning disk drives, this can lead to delays of tens of milliseconds to write handfuls of atoms, as that is the seek time (latency) for spinning disk media. Thus, synchronous commits should be disabled.
The difference between synchronous_commit=off
and =on
can be as
much as a factor of 100x for spinning disks, and a factor of 5x for
SSD drives, based on measurements on actual AtomSpace workloads.
Edit postgresql.conf
(a typical location is
/etc/postgresql/9.6/main/postgresql.conf
) and make the changes below.
The first two changes are recommended by the
PostgreSQL wiki
shared_buffers = 24GB # Change to 25% of installed RAM
work_mem = 32MB # Default was 1MB, change to 32MB
effective_cache_size = 60GB # Change to 50%-75% of installed RAM
synchronous_commit = off # Default was on, change to off
max_connections = 130 # Each AtomSpace instance needs 32
max_worker_processes = 32 # One per CPU core
ssl = off # There's no point to encyrption locally
Avoid the "checkpoints are occurring too frequently" warning message by setting:
checkpoint_timeout = 1h
max_wal_size = 8GB
checkpoint_completion_target = 0.9
For SSD drives, the following can make a significant difference. There's
some
benchmark charts at a blog.
The changes to seq_page_cost
and random_page_cost
represent the
relative ratios of SSD speed to CPU/RAM speed (where 1.0
is for a
spinning disk.)
seq_page_cost = 0.1
random_page_cost = 0.1
effective_io_concurrency = 100
Restarting the server might lead to errors stating that max shared mem
usage has been exceeded. This can be fixed by telling the kernel to use
16.4 gigabytes (for example). Edit: sudo vi /etc/sysctl.conf
and add:
kernel.shmmax = 16440100100
save file contents, then:
sudo sysctl -p /etc/sysctl.conf
Using 2MB-sized HugePages also helps. The proceedure here is a bit complicated. Add a hugepages user-group, and add postgres to it:
sudo groupadd hugepages
sudo gpasswd -a postgres hugepages
Then you need to find out what the group id (gid) was:
id postgres
Suppose that this shows group id 1234 for hugepages
. This needs to
be added to /etc/sysctl.conf
as well. So edit, and add:
vm.nr_hugepages = 16384 # 32GB of hugepages, 25% of RAM.
vm.hugetlb_shm_group=1234
Don't forget to sudo sysctl -p /etc/sysctl.conf
again.
Finally, the ability to use thos pages. Add to /etc/security/limits.conf
:
@hugepages soft memlock unlimited
@hugepages hard memlock unlimited
After following my own tuning suggestions, I shaved off about 10% to 20% in execution time:
Start 56: LargeFlatUTest
56/135 Test #56: LargeFlatUTest ................... Passed 207.30 sec
Start 57: LargeZipfUTest
57/135 Test #57: LargeZipfUTest ................... Passed 237.48 sec
Great! With only adding above parameters to $PGDATA/postgresql.conf
I'm getting these times:
Start 56: LargeFlatUTest
56/127 Test #56: LargeFlatUTest ................... Passed 120.99 sec
Start 57: LargeZipfUTest
57/127 Test #57: LargeZipfUTest ................... Passed 142.62 sec
Updated to:
1adc208abb925b7bc168be4af47bf73e6f096d4d
887eac16689bf3b94195d64abe7823115dab1431
This prints out indefinitely for LargeFlatUTest:
[DEBUG] async_buffer overfull set; had to sleep 2649 millisecs to drain!