Open breznak opened 5 years ago
I looked into this and it look pretty neat! It's always nice to see more ways to use the extra CPU cores that come with most computers now a days.
Some things to consider:
Naively, I think we could get the most bang/buck by parallelizing the methods:
Connections::computeActivity
and NOT using mutex locks. This method is fault tolerant and does not need to be exact.Connections::adaptSegment
which will need mutex lock on conenctions object only when a synapse changes its connected/potential state, so that it can modify the presynaptic-maps. This method would not be parallel, but rather would be thread safe and assume that the caller is parallel. Alternatively if this method were parallel it might be safe to eschew the mutex locks, but I think this alternative would have less performance gain. Parallel code is difficult to get right. Although this approach does look really simple...
right, that should be the advantage of this approach, c++ would parallelize select
I see 3 different levels of parallelisation that could be implemented:
Connections::computeActivity and NOT using mutex locks. This method is fault tolerant and does not need to be exact.
these are valuable hints, thank you!
Another thing I'd be interested is making ALL the code run in parallel (each column, ideally), this would be interesting from biological POV as the algos would be async, from implementation/future POV that 64+ threaded CPUs are becoming more and more common,
atomic
keyword and more atomic data structures, FB has a great library for this: https://github.com/facebook/folly/tree/master/folly
AtomicBitSet is something I have in mind for SDR internal data structureFYI, c++17 TS:Parallel in gcc-9 is now reaching availability, I've installed for Ubuntu from a PPA, our codebase compiles with g++9 #523 When I get some time I'd like to experiment with this approach more.
After #55 https://www.phoronix.com/scan.php?page=news_item&px=Intel-Parallel-STL-libstdc-libc
Paralel TS, as defined in c++17, should soon be available in GCC, Clang (supported in MSVC already :+1: ) I'm very interested to test its performance.
EDIT:
transform
from<algorithms>
TS Parallel
PROs:
CONs:
I will experiment with this, some progress outline, @breznak TODO: