Open electronicsbyjulie opened 2 years ago
Good catch with the off-by-one-error in vector_test!
And that's a good point for the more modern ranged-for-loop which I just added a test for. Slightly better performance than traditional forloop, but not as good as raw c-array (which is hard to beat, given it's so low-level, pre-allocating and all).
Want to add a test which 'tells' the vector how big it should be. That should give close to as good performance as c-array...
I have added a pre-allocated vector test, that is 'telling' the vector what capacity to start at. This means dynamic re-size doesn't happen during loop, and it is quite a lot faster. It is not quite as fast as array, but not so far off:
vector traditional forloop total time: 21696
vector ranged forloop total time: 16218
vector preallocated ranged forloop total time: 8543
array total time: 5645
Well done on the performance testing! 1.5x is certainly better than ~3x. This is looking good.
It's hard to give a go-ahead on anything that would slow the program down because of the volume of data to process. :( There are about 400 olfactory receptors and the database currently has at least 500 odorants, though the latter number will certainly grow into the thousands. After processing docking calculations for some known receptor-odorant pairs, where researchers have gotten microbial cells to express the receptors and measured their actual responses to odorants, it should be possible to find the calculation that best predicts which molecules should trigger which receptors.
Ultimately the docking application will be run for every possible receptor-odorant pair. A good run of 10 poses with 3-4 path nodes per pose takes about 30-60 minutes currently. The math works out to 400 500 0.5 hours = about 11 years. (Though I'd run them concurrently 2 simultaneous processes in the daytime and 4 at night, so could maybe cut that down to ~4 years.) Even a 50% run time increase looks substantial in that light.
Could the std::vectors be used globally and then locally converted to pointer arrays right when they're used? The loops would simply count i=0; i<length; instead of i=0; arr[i];, thereby continuing to avoid any out of range errors. If it helps to examine the target usage, Molecule::multimol_conform(), Molecule::get_intermol_binding(), and InteratomicForce::total_binding() are the most used functions for docking, although the last one seems to run pretty fast. The biggest bottleneck seems to actually be Molecule::shielded() although the nested loops in multimol_conform() also add a lot of processing time.
I understand your concern.
And your idea of switching to array in inner loops should definitely work!
However, before making any decisions I think there are assumptions at play:
In fact, I think a more relevant array test is allocating the array instead of using int a[n]
.
Is array = new int[n]
more common than int array[n]
you think? The former is an expensive heap allocation, while the latter is a cheap stack allocation...
Good pointt on the assumptions. To get a good array vs. std::vector benchmark would probably mean trying std::vector on one of the heavily used arrays such as Molecule::atoms or Atom::bonded_to.
To the stack vs. heap question there are definitely functions that allocate an array return value (using new) and the calling function has to delete (not delete[] as that causes segfaults) the result when it's done. Does seem inefficient and I guess I can see why some of the built-in C/C++ functions take a pointer to an output array rather than returning an array.
I like the idea of trying out vector for a very common operation, and see how e.g performance_test
target behaves. It usually takes 68-70 seconds on my main PopOS Linux machine. I'll try it out on one of Molecule::atoms or Atom::bonded_to and see if there is a measurable difference...
For the record: the performance test takes 8 seconds on my Macbook Air M1, on battery! Crazy impressed by this little computer 👯
(My large / home station machine with Ubuntu PopO/S is from 2016 with 16 cores Intel Skylake 6700 CPU, so it's not a weak machine at all)
That's very impressive about the Mac!! Such a dramatic difference in performance. Your Ubuntu machine is similar to my home server (also Ubuntu) and getting the exact same runtimes. I wonder does your Mac have a GPU might it be offloading some of the calculations to that. I can't use my GPU at all thanks to AMD's and Nvidia's business decisions that leave out Linux users. Or perhaps it could be splitting the process among multiple cores.
Ah, this was a bit too good to be true.
I just re-ran the performance test on my Ubuntu machine, and it takes 13 seconds.
There must have happened things in podock simulation since last time, when I got 68-70 seconds (last week or so)? Did you change how things are computed?
A bool[] array was not being initialized to false. Its values seemed to persist between calls to the same function as though it were static.
Nice catch!
So first regression since macbuild merge found.
Creating this for the conversation from #49 (note spike performance is not present in the macbuild branch, only the main branch, so if I merge macbuild then it marks closed the separate spike performance convo).