primaryodors / primarydock

PrimaryOdors.org molecular docker.
Other
6 stars 4 forks source link

Spike performance tests. #52

Open electronicsbyjulie opened 2 years ago

electronicsbyjulie commented 2 years ago

Creating this for the conversation from #49 (note spike performance is not present in the macbuild branch, only the main branch, so if I merge macbuild then it marks closed the separate spike performance convo).

There is now a spike performance test here: https://github.com/primaryodors/podock/tree/main/src/spikes/vector_vs_array

Any performance hit is potentially bad since docking is a CPU intensive calculation. If it's a small hit, then the benefits can outweigh the lost performance. If element access/iterations are not slowed then it'll be perfectly fine for docking.

Unfortunately the performance hit is pretty severe, 211% greater on my machine. :( For the volume of processing my purposes require, a threefold difference is just not usable.

objarni commented 2 years ago

Good catch with the off-by-one-error in vector_test!

And that's a good point for the more modern ranged-for-loop which I just added a test for. Slightly better performance than traditional forloop, but not as good as raw c-array (which is hard to beat, given it's so low-level, pre-allocating and all).

Want to add a test which 'tells' the vector how big it should be. That should give close to as good performance as c-array...

objarni commented 2 years ago

I have added a pre-allocated vector test, that is 'telling' the vector what capacity to start at. This means dynamic re-size doesn't happen during loop, and it is quite a lot faster. It is not quite as fast as array, but not so far off:

vector traditional forloop total time: 21696
vector ranged forloop total time: 16218
vector preallocated ranged forloop total time: 8543
array total time: 5645
electronicsbyjulie commented 2 years ago

Well done on the performance testing! 1.5x is certainly better than ~3x. This is looking good.

It's hard to give a go-ahead on anything that would slow the program down because of the volume of data to process. :( There are about 400 olfactory receptors and the database currently has at least 500 odorants, though the latter number will certainly grow into the thousands. After processing docking calculations for some known receptor-odorant pairs, where researchers have gotten microbial cells to express the receptors and measured their actual responses to odorants, it should be possible to find the calculation that best predicts which molecules should trigger which receptors.

Ultimately the docking application will be run for every possible receptor-odorant pair. A good run of 10 poses with 3-4 path nodes per pose takes about 30-60 minutes currently. The math works out to 400 500 0.5 hours = about 11 years. (Though I'd run them concurrently 2 simultaneous processes in the daytime and 4 at night, so could maybe cut that down to ~4 years.) Even a 50% run time increase looks substantial in that light.

Could the std::vectors be used globally and then locally converted to pointer arrays right when they're used? The loops would simply count i=0; i<length; instead of i=0; arr[i];, thereby continuing to avoid any out of range errors. If it helps to examine the target usage, Molecule::multimol_conform(), Molecule::get_intermol_binding(), and InteratomicForce::total_binding() are the most used functions for docking, although the last one seems to run pretty fast. The biggest bottleneck seems to actually be Molecule::shielded() although the nested loops in multimol_conform() also add a lot of processing time.

objarni commented 2 years ago

I understand your concern.

And your idea of switching to array in inner loops should definitely work!

However, before making any decisions I think there are assumptions at play:

  1. That array indexing, adding and so on are significant enough to make a difference. They may be 1/100 of all calculations that are at play
  2. That the current allocate-manually approach is quicker than the spikes stack based "allocation"

In fact, I think a more relevant array test is allocating the array instead of using int a[n].

Is array = new int[n] more common than int array[n] you think? The former is an expensive heap allocation, while the latter is a cheap stack allocation...

electronicsbyjulie commented 2 years ago

Good pointt on the assumptions. To get a good array vs. std::vector benchmark would probably mean trying std::vector on one of the heavily used arrays such as Molecule::atoms or Atom::bonded_to.

To the stack vs. heap question there are definitely functions that allocate an array return value (using new) and the calling function has to delete (not delete[] as that causes segfaults) the result when it's done. Does seem inefficient and I guess I can see why some of the built-in C/C++ functions take a pointer to an output array rather than returning an array.

objarni commented 2 years ago

I like the idea of trying out vector for a very common operation, and see how e.g performance_test target behaves. It usually takes 68-70 seconds on my main PopOS Linux machine. I'll try it out on one of Molecule::atoms or Atom::bonded_to and see if there is a measurable difference...

objarni commented 2 years ago

For the record: the performance test takes 8 seconds on my Macbook Air M1, on battery! Crazy impressed by this little computer 👯

objarni commented 2 years ago

(My large / home station machine with Ubuntu PopO/S is from 2016 with 16 cores Intel Skylake 6700 CPU, so it's not a weak machine at all)

electronicsbyjulie commented 2 years ago

That's very impressive about the Mac!! Such a dramatic difference in performance. Your Ubuntu machine is similar to my home server (also Ubuntu) and getting the exact same runtimes. I wonder does your Mac have a GPU might it be offloading some of the calculations to that. I can't use my GPU at all thanks to AMD's and Nvidia's business decisions that leave out Linux users. Or perhaps it could be splitting the process among multiple cores.

objarni commented 2 years ago

Ah, this was a bit too good to be true.

I just re-ran the performance test on my Ubuntu machine, and it takes 13 seconds.

There must have happened things in podock simulation since last time, when I got 68-70 seconds (last week or so)? Did you change how things are computed?

electronicsbyjulie commented 2 years ago

A bool[] array was not being initialized to false. Its values seemed to persist between calls to the same function as though it were static.

objarni commented 2 years ago

Nice catch!

So first regression since macbuild merge found.