ltm-erlangen / deal.ii-qc

quasi-continuum approach implemented using deal.II library
GNU Lesser General Public License v2.1
1 stars 0 forks source link

scaling of the algorithm (results for the paper) #30

Open davydden opened 7 years ago

davydden commented 7 years ago

verify performance and scalability by checking scaling of different steps of the algorithm on various problems sizes.

Do weak and strong (fixed problem size) scaling over three orders of magnitude of model size, preferably with ~10^7 particles and processes from 12 to ~10^4. Measure wall time for all operations. Plot Wallclock time vs cores.

Weak scaling -- keep number of DoFs/particles per core fixed. Plot wallclock time vs cores. Could be tricky to setup for QC (see below).


The test problem is a square domain of crystal generated using the following algorithms: refine the original single cell once. Take the cell in the lower left corner:

  1. Refine the cell once
  2. Take its lower left child and go to (1)

repeat until refined up to fully resolved atomistic level, assuming that the domain size is proportional to the unit cell width.


Parts of the algorithms:

call each N=10 times and measure the average time.

bodduv commented 5 years ago

Some points that came to my mind:

Footnote [1]: In a discuss here, Bangerth suggests that a map of cells to std::vector of particles is a bad idea because we need to resize the vector, but for our application we do not resize it often and acutally it is built once and we traverse the map or find elements in it. Given what we know about std::map's or std::multimap's poor performance, I can implement boost::container::flat_map<cell, std::vector<Molecule>>[2] version of QC fairly quickly

Footnote [2]: It would be boost::container::flat_map<cell, std::unique_ptr<std::vector<Molecule>>>, it is suggested both the key_type and value_type be small (in Chandler's talk) like a std::unique_ptr.

davydden commented 5 years ago

yet another alternative is to have a CSR-like data structure, i.e.

std::vector<unsigned int> start_of_molecules_on_cell (n_cells+1);
std::vector<Molecule> all_molecules;

then we are guaranteed to have a contiguous data layout and iterating over neighbors is not too bad. You can always ask for cell's index and thus know where its molecules start.

But I agree, once you have some relatively large representative example for the paper, we can start profiling and fiddling with data structures.

ps. I spent the whole day today to rework data structures elsewhere from std::map to something similar, hopefully will see a better performance as a result...

davydden commented 5 years ago

two more links I have on related issue:

davydden commented 5 years ago

@vishalkenchan I came across this talk https://moodle.rrze.uni-erlangen.de/pluginfile.php/13826/mod_resource/content/1/07_SIMD_JT.pdf there is a section at a the end about LAMMPS, maybe some parts will be useful for us as well. Have a look.

bodduv commented 5 years ago

I'll have a look next week. I'm in Hamburg for a short vacation.