roohy / iLASH

iLASH - IBD Estimation Using Locality Sensitive Hashing
Other
16 stars 3 forks source link

Performance: Use boost::tokenizer to read lines in filereader, increasing performance #6

Closed pettyalex closed 1 year ago

pettyalex commented 1 year ago

While looking at where things would have to change to have iLASH read in VCF files as well, I noticed that the parsing was being done with operator>> on an istringstream. This is one of the slowest ways to read in text, so I thought I'd see if performance could be improved by reading the string without copying.

This PR increases performance by about 20x on a medium-sized (2.78GB .ped file) dataset on my M1 Macbook Air. This potentially makes the threading less necessary in general.

There's three major changes in this PR, any of which I could remove if you don't want them:

  1. Introduce the Boost tokenizer as a dependency: There are several ways to do zero-copy string splitting, but this is one of the most comfortable. It's header only, so no libraries are necessary at runtime. If you don't want to introduce boost, the core performance improvement could be done without this.
  2. Move some of the pointers to smart pointers: Less potential for human error, better ergonomics.
  3. Make runFlag atomic: A change to a non-atomic boolean without any other synchronization is never guaranteed to be seen by other threads, meaning that the worker threads may never see runFlag set to false unless it's made atomic or protected by mutexes. This is particularly necessary on non-x86 computers like the new Macs, as x86 has a stricter memory model than ARM.

Let me know if you like any of these changes, or would like me to reverse any parts of them.

roohy commented 1 year ago

Hi, This is great! Thank you for the great additions. I probably should read up on the third point in the context of non-x86. It works in x86, but if it does not work for other architectures, I understand that it need to be changed. Before merging this into the main repo, I would greatly appreciate it if you could please add to the README file about the dependencies (Boost library version) and tested systems/OS versions.

Regards, Roohy

pettyalex commented 1 year ago

I'll update the README to mention that it's using a Boost header, and the environments that I've tested on, which are at this point just Ubuntu 20.04 on Skylake Xeons and Mac OS 12.4 on an M1 Macbook Air.

Regarding point 3, this answer provides a little detail: https://stackoverflow.com/questions/68320503/is-there-anything-that-would-make-a-static-bool-thread-safe

But long story short, sharing non-atomic and non-synchronized booleans between threads is undefined behavior, when one thread sets runFlag to false it would be perfectly valid C++ for another thread to never see it be updated. On x86 computers the memory model is more strict, and writes will usually make it to other threads in a reasonable time, but ARM and other platforms have a less restrictive memory model and without using either mutexes or an atomic variable, the worker threads could run forever.

In general, without any type of synchronization or atomic variables, it is perfectly valid C/C++ for two different threads to see different values for the same variable at the same time: https://stackoverflow.com/questions/22690432/boolean-has-different-values-depending-on-the-thread

pettyalex commented 1 year ago

OK, readme updated, and tested in Ubuntu 20.04. Compared performance with different, larger sets and saw an improvement there too.

rjbohlender commented 1 year ago

The changes look pretty good, but there is one thing I would recommend updating in the README. This change will require a newer compiler than the current master branch. This has been updated in the CMakeLists.txt file but not in the README. I haven't tested it but some version newer than GCC 4.8+ will probably be needed. Also it may be worth it to set the c++ version with: set(CMAKE_CXX_STANDARD 14) That way CMAKE will automatically supply the correct arguments. For example, for GCC 4.8 this will probably compile with the c++1y flag but not a c++14 flag. CMAKE should handle that automatically.

roohy commented 1 year ago

This looks great! I will merge this request. Thank you so much!