Open HenryQuan opened 3 years ago
Current implementation improves the performance about 3 times when using multithreading. However, this isn't good enough. It is challenging because all threads are related. If only the program can search in a smarter way, the performance can be improved further. Lock is causing some issues. Maybe, when it is locked, the thread can continue and wait until it has the chance? More researches are probably needed for this.
It seems that std::thread is adding too much overheat. OMP is way more efficient but it is not support on MacOS natively. The goal of this project is always being portable to all platforms. That's why thread will be kept.
OpenMP is used now for multithreading. It is also non-blocking so it should be much faster but not really. This can still be improved. I can try doing it with std::thread as well now because OpenMP requires additional library on macOS.
The mutex lock is using too much time at the moment. This makes my 24 thread machine slower than a 4 thread machine due to 6x more lock and unlock actions. Insert shouldn't be blocking because this makes multithreading useless because the rest 23 threads need to wait for the first thread to finish. My idea is to make a custom queue base on how many thread the program is using. Each thread will insert to its own vector so no locking is needed. In the end, the queue will insert all and get the top ones.