Closed pettyalex closed 1 year ago
Hi, This is great! Thank you for the great additions. I probably should read up on the third point in the context of non-x86. It works in x86, but if it does not work for other architectures, I understand that it need to be changed. Before merging this into the main repo, I would greatly appreciate it if you could please add to the README file about the dependencies (Boost library version) and tested systems/OS versions.
Regards, Roohy
I'll update the README to mention that it's using a Boost header, and the environments that I've tested on, which are at this point just Ubuntu 20.04 on Skylake Xeons and Mac OS 12.4 on an M1 Macbook Air.
Regarding point 3, this answer provides a little detail: https://stackoverflow.com/questions/68320503/is-there-anything-that-would-make-a-static-bool-thread-safe
But long story short, sharing non-atomic and non-synchronized booleans between threads is undefined behavior, when one thread sets runFlag
to false
it would be perfectly valid C++ for another thread to never see it be updated. On x86 computers the memory model is more strict, and writes will usually make it to other threads in a reasonable time, but ARM and other platforms have a less restrictive memory model and without using either mutexes or an atomic variable, the worker threads could run forever.
In general, without any type of synchronization or atomic variables, it is perfectly valid C/C++ for two different threads to see different values for the same variable at the same time: https://stackoverflow.com/questions/22690432/boolean-has-different-values-depending-on-the-thread
OK, readme updated, and tested in Ubuntu 20.04. Compared performance with different, larger sets and saw an improvement there too.
The changes look pretty good, but there is one thing I would recommend updating in the README. This change will require a newer compiler than the current master branch. This has been updated in the CMakeLists.txt file but not in the README. I haven't tested it but some version newer than GCC 4.8+ will probably be needed. Also it may be worth it to set the c++ version with:
set(CMAKE_CXX_STANDARD 14)
That way CMAKE will automatically supply the correct arguments. For example, for GCC 4.8 this will probably compile with the c++1y flag but not a c++14 flag. CMAKE should handle that automatically.
This looks great! I will merge this request. Thank you so much!
While looking at where things would have to change to have iLASH read in VCF files as well, I noticed that the parsing was being done with
operator>>
on anistringstream
. This is one of the slowest ways to read in text, so I thought I'd see if performance could be improved by reading the string without copying.This PR increases performance by about 20x on a medium-sized (2.78GB .ped file) dataset on my M1 Macbook Air. This potentially makes the threading less necessary in general.
There's three major changes in this PR, any of which I could remove if you don't want them:
Let me know if you like any of these changes, or would like me to reverse any parts of them.