tidwall / tile38

Real-time Geospatial and Geofencing
https://tile38.com
MIT License
9.18k stars 571 forks source link

Are there any benchmarks ? #54

Closed literadix closed 5 years ago

literadix commented 8 years ago

Hello,

this server is a piece of great work. I like it very much. Do you have any benchmarks or metric numbers we can expect? How many points / fences / notifications can be handled by one instance?

Thank you fro this great piece of software !

Maciej Bednarz, Hannover, Germany

tidwall commented 8 years ago

Thanks for the kind words.

Fence benchmarks is something that I've been working on but is currently past due.

The performance currently depends greatly on if you are using SETHOOK or just a standard FENCE.

The notification and networking framework for Tile38 has been under some big changes that include persistent HTTP/2 and GRPC connections.

I'll fasttrack this request and try to get something posted soon.

Thanks!

tidwall commented 8 years ago

This is also mentioned in an older issue #27.

literadix commented 8 years ago

Josh !

Thank you very much for your quick help. How can I help you ? Is there anything which could be done by me ?

Thank you very much,

Maciej

tidwall commented 8 years ago

Hi Maciej,

I could use some help on identifying the priority benchmarks or metric that should be measured, and perhaps which tools might be best for the job.

Things I want to benchmark:

What do you think would be most valuable?

johnsonc commented 7 years ago

The performance currently depends greatly on if you are using SETHOOK or just a standard FENCE.

So which do you think is better? Btw. .We're seriously thinking of using Tile38 but we can't find any perf benchmarks on the internet so it makes it a bit tricky to approve of .. Please do let us know if/how we can help. Thanks.

tidwall commented 7 years ago

@johnsonc

So which do you think is better?

I think the SETHOOK is the way to go in most cases, because it allows for a dedicated queue/notification server that Tile38 connects to directly (rather than middle-tier custom solution that connects to Tile38).

The gap in performance between SETHOOK and a standard FENCE has closed signifigantly over the past couple releases.

There's support for GRPC, Redis PubSub, and HTTP. GRPC is the quickest, followed by Redis, and then HTTP.

There's also support for Disque, which is a really great message queue system, but the Disque project is still in beta.

We're seriously thinking of using Tile38 but we can't find any perf benchmarks on the internet so it makes it a bit tricky to approve of

I totally understand. Benchmarks around geofencing is something Tile38 is sorely missing. But I can say that there's been tremendous strides made with regards to the performance of geofence delivery and Tile38 in general since this issue was first opened. We're adding a couple new protocols soon (Kakfa and MQTT) and I'm hopeful that we can get some number that are public around the same time.

Please do let us know if/how we can help

Feel free to share your experience, whether that's with benchmarking, testing, or implementation.

Thanks a bunch for considering Tile38!

johnsonc commented 7 years ago

I think the SETHOOK is the way to go in most cases, because it allows for a dedicated queue/notification server that Tile38 connects to directly (rather than middle-tier custom solution that connects to Tile38).

Thank you!

I totally understand. Benchmarks around geofencing is something Tile38 is sorely missing.

:( It gets really tough to convince those folks who are out to compare and contrast between frameworks to make a choice for a production stack. Oh well.. Right now, we are comparing ElasticSearch with Tile38 and we've come up with a few metrics that would be crucial for a service that we're considering: Scalability/Clustering, Throughput, Fault tolerance, Reliability, Monitoring - Logging and Reporting.

Feel free to share your experience, whether that's with benchmarking, testing, or implementation.

I'd be happy to! It might take us a bit, but I'll try to get back. Thanks for this amazing work!

m1ome commented 7 years ago

Starting to build up benchmarks. https://github.com/m1ome/tile38-benchmark

Mostly this was related to issue #130 to test up speed. But i think i can refactor it to work with entire system.

tidwall commented 7 years ago

@m1ome

Thanks for getting a head start on this. Before you go to far down the road with a benchmark tool, we should discuss the strategy around what is being benchmarked and the tool itself in more detail.

It looks like the project that you are building is based on a go test --bench. Which is great for simple unit test benchmarks, but maybe not so great for benchmarking operations that goes over the network. I think that the output of go test --bench may be too simplistic, and the need for a Go environment may be prohibitive for reproducibility for non-dev users.

It's going to be most important that the benchmarking tool represents real world networks. As a foundation it needs to support:

The output should be something that includes stuff like:

====== SET ======
  1000000 requests completed in 13.86 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

99.76% `<=` 1 milliseconds
99.98% `<=` 2 milliseconds
100.00% `<=` 3 milliseconds
100.00% `<=` 3 milliseconds
72144.87 requests per second

With this kind of data we can start to produce graphs that present information around operation speed and network latency.

There's a good write up on benchmarking on the Redis website.

m1ome commented 7 years ago

@tidwall i will look up to redis benchmark, maybe we can get same one but written in go for Tile38, main thing will be a concurrent access and measurement. First tool for benchmarking is just a rally point to head over to more specific one.

literadix commented 7 years ago

Thank you very much. Now we have very great numbers. Everyone could compare this great piece of software to other solutions.

2017-01-26 19:10 GMT+01:00 Pavel Makarenko notifications@github.com:

@tidwall https://github.com/tidwall i will look up to redis benchmark, maybe we can get same one but written in go for Tile38, main thing will be a concurrent access and measurement. First tool for benchmarking is just a rally point to head over to more specific one.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tidwall/tile38/issues/54#issuecomment-275464565, or mute the thread https://github.com/notifications/unsubscribe-auth/AAw0TAsSiZBJq6oLqDIffUCWCeKSZVVLks5rWOF6gaJpZM4J8nYE .

Lars-Meijer commented 7 years ago

Is someone still working on benchmarks? It may be useful to look into the Redis benchmark tool as mentioned earlier. source

And speaking about performance, is there any interest in maintaining redis-gis, and do you have any idea how it stacks up agains Tile38 especially in adding data to it (set) @tidwall ?

tidwall commented 7 years ago

@Lars-Meijer

Is someone still working on benchmarks? It may be useful to look into the Redis benchmark tool as mentioned earlier.

Yes, it's a work in progress but @m1ome is maintaining the brand new tile38-benchmark project. I haven't spent much time with it yet. It's built in Go and is focused specifically on Tile38 commands.

The redis-benchmark is quite good. I use it all the time for my other projects (Redcon, SummitDB, kvnode). Though it would require some modifications to support Tile38 commands, I can see it being used as a base.

And speaking about performance, is there any interest in maintaining redis-gis, and do you have any idea how it stacks up agains Tile38 especially in adding data to it (set)

Sorry, I'm not maintaining redis-gis at the moment and I don't have much information in the way of benchmarking. I would suspect that it could be a little faster for general purpose GSET/GGET/GSEARCH commands, because it's written in C and uses the Redis command pipeline. But it's woefully out of date in terms of features vs Tile38, and it's not stressed tested in a production environment.

Lars-Meijer commented 7 years ago

Thanks for the quick response. I ran the benchmarks and they seem to indicate that tile38 can process much more requests than what I saw originally, looks very promising.

Lars-Meijer commented 7 years ago

I did some measurements on the memory usage of Tile38 by placing a number of points within roughly the bounding box of The Netherlands with a random 15 digit key, I used the Server command to check the memory usage, then ran GC and measured again.

Number of points Usage/point Total U/p After GC
10,000 416 B 3.97 MiB 250 B
50,000 251 B 12.01 MiB 190 B
100,000 306 B 29.01 MiB 182 B
250,000 226 B 53.97 MiB 178 B
500,000 177 B 84.611 MiB 176 B
1,000,000 271 B 258.63 MiB 175 B

My initial guess to explain the difference between the usage/point and the usage/point after GC was that Tile38 allocates more memory than it needs (to prevent needing to allocate memory very often), after GC the memory that is not used would be freed. Initially the usage per point seems to be higher, probably because the server itself takes up a few MB of RAM.

Is this explanation correct, or is there something else going on?

tidwall commented 7 years ago

Your explanation is on track.

Tile38 is a static executable and the entire executable is loaded into memory (~4MB). As more points are added the memory usage will level out.

Tile38, which uses the Go runtime, hangs on to memory for as long as possible. The memory will eventually be released, but when this happens depends on factors like how often the Go memory allocator reuses the memory pages, the ratio of unused/used memory in the Tile38 process, and how much system memory is available.

The GC command is expensive and I don't recommend using it except following a mass insert of points or when debugging or diagnostics.

Lars-Meijer commented 7 years ago

@m1ome @tidwall When running benchmark with 1 client performance seems to be higher than when running 50 parallel clients, this seems a little counter intuitive to me?

====== GET(POINT) ====== 100000 requests completed in 1.557656048s 1 parallel clients keep alive: true

100.00% <= 1 milliseconds <1%(1) <= 5 milliseconds

64199.03 requests per second

====== GET(POINT) ====== 100000 requests completed in 2.327297588s 50 parallel clients keep alive: true

99.96% <= 1 milliseconds <1%(23) <= 2 milliseconds <1%(9) <= 3 milliseconds <1%(4) <= 4 milliseconds <1%(1) <= 5 milliseconds <1%(1) <= 6 milliseconds <1%(1) <= 7 milliseconds <1%(1) <= 11 milliseconds <1%(1) <= 13 milliseconds <1%(1) <= 15 milliseconds

42968.29 requests per second

I do a (simple) benchmark with Lettuce in Java I cannot get more than 20k SET per second on the same machine, maybe @m1ome can elaborate a little on how the benchmark work?

When I use my Java program with Lettuce and insert 1,000,000 points in a clean database, the rate is around 20K/s, when I then restart the database it performs around 90K sets/s, does the AOF reader do something clever, or is there something going wrong with the connection?

Any help would be greatly appreciated

tidwall commented 7 years ago

When running benchmark with 1 client performance seems to be higher than when running 50 parallel clients, this seems a little counter intuitive to me?

Seem counter intuitive to me too. I'll investigate and get back to you.

Lars-Meijer commented 7 years ago

Seem counter intuitive to me too. I'll investigate and get back to you.

It might be the benchmarks that are not correct, I've also seen the benchmark reporting increased performance on a system that was under heavy load (tho I have not extensively looked in to that result). My own tests seem to indicate that Tile38 is core bound, and reserving a core for Tile38 and running everything else on a different core definitely help performance.

m1ome commented 7 years ago

@tidwall that's why i am asked you too look upon test sources :)

Lars-Meijer commented 7 years ago

@tidwall that's why i am asked you too look upon test sources :)

The benchmarks seem to indicate the same kind of result (or slightly worse) as reading the AOF, so I am assuming they are mostly correct. But the weird thing is that I cannot get nearly the same performance when using lettuce in Java. It might be that I am doing something wrong, I will continue to look into it.

m1ome commented 7 years ago

@Lars-Meijer stop defending him. He told me he will look at benchmark code month ago. Sadness

tidwall commented 7 years ago

@Lars-Meijer Tile38 utilizes all cores if possible. It uses a single read-write locker sync.RWMutex that is shared across all connections. Read operations such as GET and NEARBY should be very quick. Write operations are slower and block connections in order to fill data structures and write to the AOF.

Loading from the AOF will likely be faster than benchmarking over the network (assuming the read performance of your SSD is better than your network). It also runs some optimized code which executes prior to the server starting, without any locking, and does not have the burden of writing anything to disk.

Regarding multi vs single core. You may want to try playing with the GOMAXPROCS system variable. For example running

$ GOMAXPROCS=1 tile38-server

which will force Tile38 to run on only one core. Perhaps this is a good thing in some cases.

@m1ome I'm looking at benchmarking options today. Thanks for your patience.

Lars-Meijer commented 7 years ago

Loading from the AOF will likely be faster than benchmarking over the network (assuming the read performance of your SSD is better than your network). It also runs some optimized code which executes prior to the server starting, without any locking, and does not have the burden of writing anything to disk.

I figured this, but I don't think it explains the slow performance in my tests in comparison to the benchmark (3x slower than the benchmarks (if they are correct), 4x slower than reading from the AOF.). I also find it unlikely that 20k updates per second saturate a gigabit ethernet network.

Is there an option to disable the AOF? And how much of an impact does it actually have, since there is absolutely no use for it in my use case.

which will force Tile38 to run on only one core. Perhaps this is a good thing in some cases.

I used the taskset (http://manpages.ubuntu.com/manpages/wily/man1/taskset.1.html) command to lock it to one core, and locked other high load processes to the other cores. It seems to perform a lot better when I do this.

tidwall commented 7 years ago

Is there an option to disable the AOF and how much of an impact does it actually have, since there is absolutely no use for it in my usecase.

There isn't an option to disable the AOF because required for core operations.

I just pushed a custom build to the memoptz branch that provides a --appendonly no flag. This will disable the AOF so you can test performance on your side. It's likely to break some stuff like Leader/Follower syncing, but standard commands like GET/SET/NEARBY will work.

Lars-Meijer commented 7 years ago

I just pushed a custom build to the memoptz branch that provides a --appendonly no flag. This will disable the AOF so you can test performance on your side. It's likely to break some stuff like Leader/Follower syncing, but standard commands like GET/SET/NEARBY will work.

Thanks, first impressions it makes quite a bit of difference, I will test further tomorrow. Edit: It should be noted that the server I am testing on has an HDD not an SSD so this might amplify the benefits

tidwall commented 7 years ago

@Lars-Meijer Sounds good. I look forward to the results.

tidwall commented 7 years ago

I pushed an update to the master branch which includes a packaged tile38-benchmark tool.

You will now likely see performance on par with Lettuce.

Make sure to check out the --help menu and https://redis.io/topics/benchmarks


In case you want to test the performance of the tool itself you can run it against a Redis instance like such:

$ tile38-benchmark -p 6379 -t SET,GET --redis

And compare it to:

$ redis-benchmark -p 6379 -t SET,GET

Or with pipelining:

$ tile38-benchmark -p 6379 -t SET,GET -P 10 --redis
$ redis-benchmark -p 6379 -t SET,GET -P 10
Lars-Meijer commented 7 years ago

I've merged master into the memoptz branch and tested with benchmark tool, tile38 running on 1 core with AOF on SSD:

====== SET (point) ====== 100000 requests completed in 4.47 seconds 50 parallel clients 82 bytes payload keep alive: 1

24.39% <= 0 milliseconds // A lot more 100.00% <= 24 milliseconds 22392.72 requests per second

Without the AOF on my PC (SSD):

====== SET (point) ====== 100000 requests completed in 3.34 seconds 50 parallel clients 82 bytes payload keep alive: 1

32.85% <= 0 milliseconds //a lot more 100.00% <= 23 milliseconds 29911.44 requests per second

With the Tile38 server on 2 cores it seems to be about 2k/s better, this is indeed very close to the performance I am seeing with Lettuce. Thanks very much for your quick support :+1:

Lars-Meijer commented 7 years ago

These are some more results form my not-so-fast server (vm on 4 old xeon cores (vCPU's @ 2.6GHz), locked to 1 core and 8 Gb of DDR3 ram @ 1333MHz) with AOF disabled.

When I run the benchmark on a different machine: ====== SET (point) ====== 100000 requests completed in 10.29 seconds 50 parallel clients 82 bytes payload keep alive: 1

0.51% <= 1 milliseconds //More 100.00% <= 48 milliseconds 9714.58 requests per second

When the benchmark are ran directly on the server: ====== SET (point) ====== 100000 requests completed in 5.43 seconds 50 parallel clients 82 bytes payload keep alive: 1

19.60% <= 0 milliseconds // More 100.00% <= 44 milliseconds 18409.89 requests per second

It might be network overhead, but the server has is on the same network as I am and the ping latency is about 1 ms.

tidwall commented 7 years ago

@Lars-Meijer Thanks for sharing your results.

The local loopback interface should be faster than over the network. The near 20% <= 0 milliseconds vs 0.5% <= 1 milliseconds hints to me that the network may be bottleneck.

Lars-Meijer commented 7 years ago

The local loopback interface should be faster than over the network. The near 20% <= 0 milliseconds vs 0.5% <= 1 milliseconds hints to me that the network may be bottleneck.

Could be, I don't have the capacity to look into the network at this moment. I am able to get the desired performance by starting multiple Tile38 instances and balancing load between them.

Lars-Meijer commented 7 years ago

For anyone else interested on my slightly better desktop machine at home I was able to get the following results

Test system: Intel Core i5, 4 cores @ 3.3GHz, turbo boost disabled. 16 Gb RAM (1600MHz) Ubuntu Desktop 16.10 250 GB Samsung Enterprise SSD (Similar to Samsung 830)

The benchmarks where run on the same machine as the tile38 server, I might do some benchmarks over network later :). Performance can probably be increased a tiny bit by running some lightweight server instead of a full desktop.

All benchmarks had the following parameters: 100000 requests 50 parallel clients keep alive: 1

noappend-multicore.txt noappend-singlecore.txt normal-multicore.txt normal-singlecore.txt summary-noappend-multicore.txt summary-noappend-singlecore.txt summary-normal-multicore.txt summary-normal-singlecore.txt

The summary files provide a good starting point if you are just interested in the numbers. They provide the % done in less than 1 ms, the latency for 100% completion and the insert rate

johnsonc commented 7 years ago

Thanks for the benchmarks Lars! Josh your'e doing some pretty amazing work!

On 1 April 2017 at 00:55, Lars Meijer notifications@github.com wrote:

For anyone else interested on my slightly better desktop machine at home I was able to get the following results

Test system: Intel Core i5, 4 cores @ 3.3GHz, turbo boost disabled. 16 Gb RAM (1600MHz) Ubuntu Desktop 16.10

The benchmarks where run on the same machine as the tile38 server, I might do some benchmarks over network later :)

All benchmarks had the following parameters: 100000 requests 50 parallel clients keep alive: 1

noappend-multicore.txt https://github.com/tidwall/tile38/files/886694/noappend-multicore.txt noappend-singlecore.txt https://github.com/tidwall/tile38/files/886695/noappend-singlecore.txt normal-multicore.txt https://github.com/tidwall/tile38/files/886696/normal-multicore.txt normal-singlecore.txt https://github.com/tidwall/tile38/files/886698/normal-singlecore.txt summary-noappend-multicore.txt https://github.com/tidwall/tile38/files/886699/summary-noappend-multicore.txt summary-noappend-singlecore.txt https://github.com/tidwall/tile38/files/886700/summary-noappend-singlecore.txt summary-normal-multicore.txt https://github.com/tidwall/tile38/files/886697/summary-normal-multicore.txt summary-normal-singlecore.txt https://github.com/tidwall/tile38/files/886701/summary-normal-singlecore.txt

The summary files provide a good starting point if you are just interested in the numbers.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tidwall/tile38/issues/54#issuecomment-290805929, or mute the thread https://github.com/notifications/unsubscribe-auth/ABIBhidNmDopXYfRHaO7HbH7gLajyKOJks5rrVMkgaJpZM4J8nYE .

-- Regards, Johnson Chetty

tidwall commented 7 years ago

@Lars-Meijer I'm actually pretty surprised by the how large the difference is between append and noappend is 20%. Thanks for providing these benchmarks.

tidwall commented 7 years ago

@johnsonc Thanks. :)

Lars-Meijer commented 7 years ago

I've tested some more and the results of the benchmarks seem very stable. They seem to indicate performance very well and match the results I get with my own test. They would also be good to test performance between versions.

As for the performance of the append only file, I would call the difference significant. It might be worthwhile to look into methods to increase the performance of the Append only file, or make it configurable to enable and disable is. Kafka is disk backed, and has a write up on their persistence https://kafka.apache.org/documentation.html#persistence, although I am not sure how useful this is.

Benchmarks over network have very similar results, even slightly better in some cases, tho that might also be some noise by some background programs and a chrome tab that was open during the previous benchmarks. It seems that there was a network bottleneck during earlier tests

I've attached the benchmark results for the tests over network (gigabit ethernet).

--- 192.168.2.13 ping statistics --- 100 packets transmitted, 100 received, 0% packet loss, time 99009ms rtt min/avg/max/mdev = 0.253/0.366/0.419/0.033 ms

network-noappend-multicore.txt network-noappend-singlecore.txt network-normal-multicore.txt network-normal-singlecore.txt summary-network-noappend-multicore.txt summary-network-noappend-singlecore.txt summary-network-normal-multicore.txt summary-network-normal-singlecore.txt

sfroment commented 7 years ago

Hello,

I don't know if it's it's the right place to put my question but I believe so. I'm starting to use tile38 and I was wondering if there was a way to get multiple leader, to be able to set across multiple instance?

Thanks.

tidwall commented 7 years ago

Hi @sfroment,

I'm starting to use tile38 and I was wondering if there was a way to get multiple leader, to be able to set across multiple instance?

Sorry but Tile38 only supports one leader at a time for a single collection. If you need to scale to multiple leaders then you'll have to geographically shard your collections. For example, one leader could hold a collection that is eastern united states and another for western united states. etc.

tidwall commented 5 years ago

I'm closing this issue because it's pretty old, and there have been many enhancements and optimizations over the past few years.