paesanilab / MBX

MBX is an energy and force calculator for data-driven many-body simulations.
Other
30 stars 32 forks source link

Domain decomposition implementation [WIP] #5

Closed chemphys closed 7 years ago

chemphys commented 7 years ago

Proceeding to the implementation of an efficient neighbor list.

I have been looking at different algorithms used in the MD packages to keep track of the neighbors. I put here some of the most interesting ones:

Since we are going to aim for big systems, and even if the box is small, I would go for the link cell algorithm, since it will also work for non-pbc. @dgasmith , do you have in mind any other algorithm that we can implement (something maybe more recent)? You mentioned some papers about this.

Update 20170821

We decide to go with KD-Tree to keep track of neighbors and perform the domain decomposition. The KD-Tree will give us a breakdown of all the molecules in the system. The procedure to follow will be the following:

1-body energy

For the 1B energy, we will do it all at once, since we don't need to look for pairs or neighbors. We will send chunks of data to each core that will use SIMD instructions.

2-body energy

Here is where we will take advantage of the kd-tree.

  1. Set a reasonable cutoff Rc that should work for all the dimers (since none of our polynomials right now has a cutoff larger than 9.5, setting it at 10 or 10.5 should be OK)
  2. Get the dimers We will get the dimers as we build the tree:
    Add monomers[0] to tree (create tree)
    For (i = 1, i < monomers.size(), i++) do 
    Find list of monomers in tree within a distance Rc
    Add those pairs to the appropiate dimer vector (h2o - h2o, h2o - cl ...) 
                      // Maybe we can have a vector of vectors, so it will be 
                      // easy to align
    Add monomer[i] to tree
    Done

    This should do the work for the 2B terms. But I am afraid that is not efficient, since kd-trees are not prepared to be modified. The only way that comes to my mind is the following:

    • Generate kd-tree with all data
    • For each monomer i (we know the index), perform a radial search in the tree within the cutoff Rc. The code will throw a vector with pairs of all the indexes j and distances <j,d>. Then, for each pair it throws, if j <= i, we discard that and we don't count it. But I have doubts about the efficiency of this... @dgasmith , what do you think?

3-body energy

I will update later this part, once we figure out how to do the 2b.

Update 20170824

The algorithm implemented is the following:

  1. Generate kd-tree with all data
  2. For each monomer i:
    • Perform radial search around i within the cutoff defined (default 10.0 Ang)
    • For all the matches j:
    • If INDEX(j) > INDEX(i), add it to the dimer vector (this will take care of double counting)
    • If the dimer is added, perform a radial search around j
    • For all the matches k, add the corresponding trimer if INDEX(k) > INDEX(j)

This seems to do the work.

dgasmith commented 7 years ago

Two comments:

I can dig up current algorithms, but the complexity vs implementation time may not be worth it.

chemphys commented 7 years ago

@dgasmith , the kdtree class you pointed me to makes use of the boos libraries for geometry points. Is that OK?

dgasmith commented 7 years ago

Not sure which one I linked. This one does not appear to need boost (although it does have conditionals if BOOST flags are present).

Its not a big deal if you need boost, but I would recommend at least attempting to avoid it.

chemphys commented 7 years ago

I will try to avoid it anyway. This one you sent looks way better. I will modify the issue with the plan on how to use the data structure during the day of today, this way you see what I plan to do. Thanks for your help!

chemphys commented 7 years ago

@dgasmith , can you have a look at the issue description? I updated it with a plan. Let me know if that seems good, or if you have a better idea. I am not 100% convinced, but I cannot think anything better...

dgasmith commented 7 years ago

This is a first pass at domain decomposition. Better algorithms keep track of "windows" and watch to see if the molecule leaves those windows before updating neighbors. So a plain KD-tree is not as efficient as proper algorithms; however, I am reasonably certain the computation of the electrostatics is going to dominate your total cost. At any rate this should get you progressing down the line.

As a note there isn't much point in having a vector of vectors as you cannot SIMD over the different kinds of interacting pairs as I currently understand your algorithm.

There is a well known quote by Donald Knuth:

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming

chemphys commented 7 years ago

I will take that in mind :) I'll keep going with that. Thanks!