opennars / OpenNARS-for-Applications

General reasoning component for applications based on NARS theory.
https://cis.temple.edu/~pwang/NARS-Intro.html
MIT License
89 stars 40 forks source link

Multithreading efficiency #208

Open patham9 opened 2 years ago

patham9 commented 2 years ago

Make multi-threading effective again. See if it can be used for temporal compounding in sensorimotor inference, which likely would bring the biggest performance gain.

automenta commented 2 years ago

lets see some benchmarks to know if the concurrency support actually does anything, including if they make performance worse or better and how much on different #s of threads (1, 2 4, ...) also on diff cpu/arch/OS

patham9 commented 2 years ago

Since inverted term index was introduced I removed the related parallel-for loops as it didn't bring benefits for v0.9.1. But with thew new additions planned for this issue it will be effective again. I typically use time command on evaluation.py: time python3 evaluation.py This runs a timing summary on all examples so will include both changes in performance of senantic and sensorimotor inference. We can run it once the branch is merged to see.

automenta commented 2 years ago

https://en.wikipedia.org/wiki/Amdahl's_law a profiler's thread view will show what parts are parallelized and what parts, if any, are serial.

running on synthetic tests will probably not represent the actual kind of workload in, for example, a NAR with a steady full memory "at cruising altitude". ie. gains made on tests are not likely to apply completely for actual workloads.

as far as i can tell, the only way to sufficiently parallelize a NAR is by using a fixed threadpool of threads (less than, or about as many cpu cores) that execute reasoner cycles asynchronously and independently.

this still requires some synchronization points, for example the various mutable data structures like Bag's, BeliefTable's, etc... it is possible to safely isolate the critical sections to these data structures - particularly where WRITES are involved - by using locks, atomic CAS, etc.

patham9 commented 2 years ago

Over time new experiments have been added to the collection, some of which lead the system to operate at full capacity, while others are more "synthetic". Of course this collection should be extended and stay part of the repository, that's how the benchmarks can be reproducible by others.

Regarding profiling and thread pools: I agree, this seems to be the way to go for this implementation.

automenta commented 2 years ago

so to make it thread safe:

  1. identify all the mutable data structures and make those thread safe (locks, atomics, etc...). assume everything else is going to run in its own oblivious single thread that will call the shared data structure's functions.

  2. manage a threadpool with individual on/off toggles. simply run whatever cycle proceedure in each thread. the toggles will provide basic throttling ability (ex: 40% cpu usage)

if possible i would try to use well developed external lib for the threading and locks and concurrent data structures. something like Boost. java provides so much infrastructure for doing something like this and its easy to take it for granted. ex: its StampLock's

patham9 commented 2 years ago

I agree to the former two points. To the latter: I know you are not very used to OS API's but POSIX threads are all that is needed, featuring locks condition variables etc. Boost is a C++ lib and a bit bloat when only used for threading, and last time I checked it even lacked some relevant threading functions for fine-grained thread control.

automenta commented 2 years ago

i mean, for example: you may want to borrow an off-the-shelf data structures (ex: concurrent hashtable, concurrent skip-list, etc). i know you dont want Boost - i'm saying don't implement these from scratch. you might as well use something already made:

https://www.reddit.com/r/cpp/comments/mzxjwk/experiences_with_concurrent_hash_map_libraries/

jnorthrup commented 2 years ago

https://keep-calm.medium.com/rules-of-structured-concurrency-in-kotlin-dad5623423a4 ctrl-alt-k all the things in intellij. concurrency is boring in kotlin