gwlucastrig / Tinfour

Delaunay and Constrained Delaunay Triangulations in Java, providing high-performance utilities for modeling surfaces with support for Lidar LAS files, Digital Elevation Models (DEM), finite element analysis, path planning, natural neighbor interpolation, and other applications of Triangulated Irregular Networks (TIN)
Apache License 2.0
158 stars 34 forks source link

Perf: investigate performance with ThreadLocalRandom #4

Closed jandam closed 7 years ago

jandam commented 7 years ago

SemiVirtualStochasticLawsonsWalk uses Random. Is it possible to replace it with ThreadLocalRandom to improve performance?

gwlucastrig commented 7 years ago

What an excellent suggestion! I wasn't aware of the ThreadLocalRandom API. Thanks for bringing it to my attention. I just ran some tests calling the nextInt() method, and see that it takes only about 14% the run time of the regular Random class. It a very easy change and I will integrate it into both the semi-virtual and standard implementations when I get my current round of revisions squared away.

The Random action in the Lawson's walk is basically necessary to prevent the walk from ending up in an infinite loop. It's performed once per hop as the algorithm traverses the TIN. So the key to figuring out the efficiency of this thing is how many times it is called.

Are you familiar with the "Single Build Test" application in the Tinfour distribution? It's described in my "Algorithms and Data Elements" document. Anyway, one thing it does at the end of building a TIN is to dump the construction statistics. Running my Bear Mountain sample data (all points, not just ground), I see that I get

Construction statistics Number of SLW walks: 4874727 avg steps to completion: 9.40

so the nextInt() function is called about 45 million times. I wrote a quick test program comparing the run times of Random and ThreadLocalRandom, and the saving for 45 million method calls was about half a second.

So than I ran the Bear Mountain data, which has 4.8 million sample points, through my "Repeated Build Test" to get some idea of the total time required to build a TIN for a sample set that side. The average was 4.8 seconds. So following your suggestion and switching the random API has the potential of nearly a 10 percent improvement in run time.

That being said, my experience with Tinfour has been that you can never really trust estimates like that until you've actually tested the change. A computer running Windows is just too noisy a test environment. That's why I wrote the "Twin Build Test" which runs two different versions of the code side-by-side (so that if there's any unexpected system loading going on, it will affect both versions). I will look at that sometime over the next few days. If you're interested, I invite you to try it yourself.

jandam commented 7 years ago

Today is first time I use your library. My test case has more than 81 million points. It's extreme amount of data that never be used in production. But I have to test.

gwlucastrig commented 7 years ago

That is an impressive test case. As much as I am pleased to see somebody using my code, ethical considerations require me to caution you on an issue you might encounter. Even the semi-virtual implementation keeps all objects and data in core (thus the "semi" rather than "true" virtual). So it requires about 120 bytes per point. If you try to use Tinfour to process your entire data set at once, you'll be needing about 10 gigabytes of memory. Can you process them in parts? I'd hate to see you wasting your time if the Tinfour tool is not appropriate to your needs.

gwlucastrig commented 7 years ago

I have replaced the used of Random in the Stochastic Lawson's Walk classes with a customized random function. The result was a 7 percent increase in speed for poorly spatially correlated data.

I considered using the ThreadLocalRandom class. It was slightly faster than the custom code, but it had a big disadvantage in that it does not permit an application to set the random seed. For debugging purposes, it is important to be able to reproduce the behavior of Tinfour every time it is run. Unfortunately, the psuedo random sequence generated by ThreadLocalRandom is different every time it is used.

The method used is based on the excellent work of Sebastiano Vigna, see An experimental exploration of Marsaglia's xorshift generators, scrambled. It was also discussed at The Javamex website's article XORShift Random Number Generators

Thanks again to jandam for his excellent suggestion.

gwlucastrig commented 7 years ago

If you have any further questions regarding using Tinfour for your project, let me know.

On Thu, Dec 1, 2016 at 11:47 AM, Martin JANDA notifications@github.com wrote:

Today is first time I use your library. My test case has more than 81 million points. It's extreme amount of data that never be used in production. But I have to test.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gwlucastrig/Tinfour/issues/4#issuecomment-264226041, or mute the thread https://github.com/notifications/unsubscribe-auth/ASyR4dZgaWhAeV2uLVWfzhkFnFbPw0GSks5rDvoigaJpZM4LBbGu .

jandam commented 7 years ago

Hi,

thank you very much. I'm busy with another project. Your library is very good. It's first time I use TINs. So I have question. I there some way to split vertices into tiles and achieve smooth interpolation between them?

I hope that I will have more time to work with your library next year.

Thank you very much. Martin

From: "gwlucastrig" notifications@github.com To: "gwlucastrig/Tinfour" Tinfour@noreply.github.com Cc: "jandam" jandam@crcdata.cz, "Author" author@noreply.github.com Sent: Friday, December 2, 2016 9:41:35 PM Subject: Re: [gwlucastrig/Tinfour] Perf: investigate performance with ThreadLocalRandom (#4)

If you have any further questions regarding using Tinfour for your project, let me know.

On Thu, Dec 1, 2016 at 11:47 AM, Martin JANDA notifications@github.com wrote:

Today is first time I use your library. My test case has more than 81 million points. It's extreme amount of data that never be used in production. But I have to test.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gwlucastrig/Tinfour/issues/4#issuecomment-264226041, or mute the thread https://github.com/notifications/unsubscribe-auth/ASyR4dZgaWhAeV2uLVWfzhkFnFbPw0GSks5rDvoigaJpZM4LBbGu .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub , or mute the thread .

gwlucastrig commented 7 years ago

Martin,

Good to hear from you. I've thought about this problem a few times. So far, the only solution I've found was to organize the data into smaller tiles. I then process them in 3-by-3 blocks of tiles, interpolating values for the center of the block before moving on to the next. If the distribution of sample points is fairly uniform and the tile size not too small, this gives an approximate appearance of continuity. You can improve the transition between tiles by using a blending approach along the edges.

Clearly, the process I describe involves some redundancy, especially since each block is a brand new TIN. You can improve Tinfour's throughput by reusing the instance of the IIncrementalTin object, calling the clear() method between each block. This approach will eliminate the Java overhead for new object instantiation. It will make a more substantial difference when using the standard IncrementalTin class rather than the SemiVirtualIncrementalTin class. Some of the example applications in the Tinfour test.performance package do this. Remember that for TINs of 1 million points or less, the IncrementalTin class is faster than the SemiVirtualIncrementalTin. I would try using both classes to see which one works better for you.

You can also improve throughput by performing a Hilbert sort on each tile before beginning the main TIN process. Sorting samples improves the insertion speed. If you were processing a single medium-sized set of points once, the sort wouldn't help. The time required for the sort would exceed the time saved on the insertion into the TIN. But in the process described above, we are inserting points from each small tile nine times. So we sort once, but save processing time nine times. You can do a Hilbert sort using the HilbertSort class which is included in the Tinfour software. The sort is used in the TinfourViewer example application as well as some of the test applications.

Although I have not implemented them, there are algorithms for serializing TINs to read and write them to files and also algorithms for merging TINS (which is essentially what Dyer's "divide and conquer" algorithm does). This would help your processing, but won't be available in the near future.

Gary

On a personal note... I noticed that you are located in Prague. Some of my ancestors came from that area (a century ago). So I am pleased to see my software being used in a city for which I feel a sense of connection.