sstsimulator / sst-elements

SST Architectural Simulation Components and Libraries
http://www.sst-simulator.org
Other
92 stars 119 forks source link

Simulating a complete HPC system #2314

Open sumant-kalra opened 9 months ago

sumant-kalra commented 9 months ago

I have been using ember motifs for the network simulations with ~10K nodes using firefly NIC and merlin router. Now I am interested in simulating a complete HPC system by integrating the compute and memory models in the ember engine such that motifs pass their events to CPU and memory first before going to the NIC and then the router in the topology. I could learn how to create a memory hierarchy and add CPU components to the memory hierarchy but could not figure the way to integrate the memory hierarchy with CPU to the firefly NIC. I have a couple of questions regarding the same.

  1. The simulations involving CPUs and memory models are driven by the generators of the CPUs (for pattern based) or by the traces from the application run on the actual hardware for trace-based models. Whereas the ember engine related simulations are driven by the ember motifs. These two seems to have two different workflows that now I want to connect by driving the CPU, memory and NIC from the ember engine motifs. Is there an existing library in sst_elements that can create such a system? Documentation of ‘Thornhill’ element states that it can provides such capabilities. Is the current implementation of the library sufficient to create a basic HPC model with compute, memory, firefly NIC and merlin router? If no, what needs to be implemented?
  2. I am trying to understand the philosophy behind using all the available models for different kinds of simulations. I see the CPU and memory models are used for modelling a network on the chip and simulating the workflow from CPU generators or traces. Whereas the simulations at scale involving multiple nodes are run using ember engine. Are we expected to create multiple models of the same HPC system with each part simulating only a part of the actual workload? Let’s say we to evaluate the sparse matrix multiplication algorithm simulation involving ~100 nodes, is it part of the SST design that we have to run the memHierarchy simulation and ember engine motif separately to evaluate the memory and network individually or are there way to use a single model that can evaluate both?
halitdogan commented 3 weeks ago

have you figured out how it works?