Measuring scatter of task executions across diverse distributions of resources.
Related paper at: https://bitbucket.org/shantenujha/aimes
Prerequisites: Python 2.7; pip; git; radical-pilot
Clone this repository:
git clone https://github.com/radical-experiments/AIMES-Experience.git
Install RADICAL Cybertools:
virtualenv ~/ve/aimes-experience
. ~/ve/aimes-experience/bin/activate
git clone git@github.com:radical-cybertools/radical.pilot.git
cd radical.pilot; git checkout experiment/aimes; git pull; pip install --upgrade . ; cd ..
git clone git@github.com:radical-cybertools/radical.utils.git
cd radical.utils; git checkout experiment/aimes; git pull; pip install --upgrade . ; cd ..
git clone git@github.com:radical-cybertools/saga-python.git
cd saga-python; git checkout experiment/aimes; git pull; pip install --upgrade . ; cd ..
Move into the AIMES-Experience
directory.
Edit the file experiment.py
setting the following global variables to their appropriate value:
N_UNITS = 2048
U_CORES = 1
U_TIME = 15
RESOURCE = 'xsede.comet'
N_PILOTS = 4
P_CORES = 512
P_WALLTIME = 75
PROJECT = 'TG-XXXXXXXXX'
Note: SSH key-based, passwordless access to the choosen resource(s) is required.
Set up your execution environment:
. setup.sh
run the experiment:
python experiment.py
Download session for the experiment:
radicalpilot-close-session -m export -d mongodb://54.221.194.147:24242/aimes-experience -s rp.session.xxxx.xxxx.xxxx.xxxx.xxxx
Upon success of the previous command, create a directory runn
where n
uniquely and incrementally indicates the number of the experiment.
Create a file inside runn
called metadata.json
with the following information:
{
"n_tasks": <int>,
"n_cores": <int>,
"pilots": [
[<int n_cores>, <walltime>],
...
],
"resources": [
"resource.tag",
...
],
"cores": [
[<int tasks>, <int n_cores>],
...
],
"durations": [
[<int tasks>, <int duration>],
...
]
}
Example:
{
"n_tasks": 2048,
"n_cores": 2048,
"pilots": [
[512, 75],
[512, 75],
[512, 75],
[512, 75]
],
"resources": [
"xsede.comet"
],
"cores": [
[2048, 1]
],
"durations": [
[2048, 15]
]
}
Note:
"cores"
and "durations"
are used to describe partions of the set of tasks. At the moment, we use just 1 core and 15 minutes duration for each task but we will have to use more complex distributions or cores and durations.Copy the .prof, .json, and log file into the runn
directory:
cp rp.session.xxxx.xxxx.xxxx.xxxx.xxxx.prof rp.session.xxxx.xxxx.xxxx.xxxx.xxxx.json logs/radical_debug.log runn/
Pull and push the repository.