Open davydden opened 7 years ago
Some points that came to my mind:
std::multimap<cell, Molecule>
is not a good choice, two options arise:
boost::container::flat_map<cell, std::vector<Molecule>>
[1] or such (some of the requirements put up by Chanlder Carruth in CppCon 2014 were addressed and shows significant performance improvement; there are other contenders like google::dense_hash_map
and ska::flat_hash_map
but I suppose Boost library's flat_map
would be a better first choice given that it's maintained)Footnote [1]: In a discuss here, Bangerth suggests that a map of cells to std::vector of particles is a bad idea because we need to resize the vector, but for our application we do not resize it often and acutally it is built once and we traverse the map or find elements in it. Given what we know about std::map
's or std::multimap
's poor performance, I can implement boost::container::flat_map<cell, std::vector<Molecule>>
[2] version of QC fairly quickly
Footnote [2]: It would be boost::container::flat_map<cell, std::unique_ptr<std::vector<Molecule>>>
, it is suggested both the key_type and value_type be small (in Chandler's talk) like a std::unique_ptr
.
yet another alternative is to have a CSR-like data structure, i.e.
std::vector<unsigned int> start_of_molecules_on_cell (n_cells+1);
std::vector<Molecule> all_molecules;
then we are guaranteed to have a contiguous data layout and iterating over neighbors is not too bad. You can always ask for cell's index and thus know where its molecules start.
But I agree, once you have some relatively large representative example for the paper, we can start profiling and fiddling with data structures.
ps. I spent the whole day today to rework data structures elsewhere from std::map
to something similar, hopefully will see a better performance as a result...
two more links I have on related issue:
@vishalkenchan I came across this talk https://moodle.rrze.uni-erlangen.de/pluginfile.php/13826/mod_resource/content/1/07_SIMD_JT.pdf there is a section at a the end about LAMMPS, maybe some parts will be useful for us as well. Have a look.
I'll have a look next week. I'm in Hamburg for a short vacation.
verify performance and scalability by checking scaling of different steps of the algorithm on various problems sizes.
Do weak and strong (fixed problem size) scaling over three orders of magnitude of model size, preferably with ~10^7 particles and processes from 12 to ~10^4. Measure wall time for all operations. Plot Wallclock time vs cores.
Weak scaling -- keep number of DoFs/particles per core fixed. Plot wallclock time vs cores. Could be tricky to setup for QC (see below).
The test problem is a square domain of crystal generated using the following algorithms: refine the original single cell once. Take the cell in the lower left corner:
repeat until refined up to fully resolved atomistic level, assuming that the domain size is proportional to the unit cell width.
Parts of the algorithms:
call each N=10 times and measure the average time.