multicore-locks / litl

LiTL: Library for Transparent Lock Interposition
MIT License
79 stars 21 forks source link

benchmark on memcached #4

Open posutsai opened 4 years ago

posutsai commented 4 years ago

I have tried to reproduce performance measurement on memcached. However, I can't more clear details about how you benchmark it. Could you provide more detail such as

I already try utilize memtier_benchmark since it is recommended by official. Here is my benchmarking command. memtier_benchmark -x 5 -c 50 -t 100 -d 32 --ratio=1:0 --pipeline=1 --key-pattern S:S -P memcache_binary --hide-histogram Yet, the result after applying linker interposition on memcached is pretty unstable. I would like to have some comment from the author.

Apart from benchmark, I encounter several issues while reading the paper. Please correct me if I misunderstand the content in the paper. According to page 60 Table 4-11, on the memcached-new column, it shows the performance with hmcs lock is poorest. Since pthread has 103 performance gain compared to hmcs, those locks whose performance gain is greater than 103 should probably perform better than original pthread_mutex version such as mcs_stp with 582 performance gain. Nevertheless, the performance isn't like expected and it even has worse performance than pthread_mutex.

Furthermore, I would like to know the testing on pthread_mutex is based on original version with linker interposition in LITL or replace the symbol with libpthreadinterpose_original.sh.

Last but not least, thank you for providing such a great tool to try various lock so easily. It helps me a lot. Please let me know if you need further description to understand my issue. I will provide it ASAP.

HugoGuiroux commented 4 years ago

Hi @posutsai.

Sections 4.1.3 and 4.1.4 contain more information about Memcached benchmarking

For the Memcached-* experiments where some nodes are dedicated to network injection, memory is interleaved only on the nodes dedicated to the server.

For Memcached, similarly to other setups used in the literature [72, 42], the workload runs on a single machine: we dedicate one socket of the machine where we run memaslap to inject network traffic to the Memcached instance, the two running on two distinct sets of cores.

Note that the memcached version do play a role in the results, cf footnote page 46.

Memcached 1.4.15 uses a global lock to synchronize all accesses to a shared hash table. This lock is known to be the main bottleneck. Newer versions use per-bucket locks, thus suffer less from contention.

Regarding you question about pthread_mutex, we use libpthreadinterpose_original.sh for pthread. Close to the end of Section 4.1.4:

Besides, in order to make fair comparisons among applications, the results presented for the Pthread locks are obtained using the same library interposition mechanism (see Chapter 3) as with the other locks.

posutsai commented 4 years ago

Thank you for replying. I guess the main reason why I can't reproduce the experiment is because of hardware. Anyway I figure out what kind of lock suits my application most.

By the way I would like to know if you ever consider doing this kind of lock replacement through static source code analysis instead of linker interposition?