uzh / vm-mad

Dynamically grow or shrink GridEngine clusters using cloud-based nodes
https://arxiv.org/abs/1302.2529
Apache License 2.0
3 stars 2 forks source link

script to distil SGE accounting data or qstat snapshots into data usable for a simulation #3

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Let's call this tool `distil.py` (provisional name -- feel free to
change).

Depending on command line options, the `distil.py` script reads:

(1) a set of `qstat -xml` output files (all contained in the same
    directory; each file is possibly gzip-compressed), or
(2) an SGE accouting data file [3].

In any case, the output of the `distil.py` script is a CSV file in the
"useful data" format below.

Define "useful data" as a CSV file, each line of which has this format:

       jobid,submit_time,duration

where:

- `jobid` is the unique job identifier
- `submit_time` is the time when the job first appears in qstat output
  (For ease of processing, it could be an "epoch"[2], i.e., the number of
  seconds since midnight Jan. 1, 1970.  IOW, the standard UNIX way of
  representing time.)
- `duration` is the duration of the job, in seconds

Additional fields may be provided, but I think they are not useful in
the simulator.

[1]: http://docs.python.org/library/csv.html
[2]: http://docs.python.org/library/time.html
[3]: http://arc.liv.ac.uk/SGE/htmlman/htmlman5/accounting.html

Original issue reported on code.google.com by riccardo.murri@gmail.com on 30 Jan 2012 at 8:34

GoogleCodeExporter commented 9 years ago

Original comment by riccardo.murri@gmail.com on 30 Jan 2012 at 8:41

GoogleCodeExporter commented 9 years ago
I'm already ready with this issue. I have only one questions about the duration:

"- `duration` is the duration of the job, in seconds." So, does it mean that 
we take in consideration already submitted jobs? It's a little bit confusing 
because we are interested is cloud candidate jobs, aren't we? 

The following attributes are listed in accounting:

submission_time
       Submission time.

start_time
       Start time.

end_time
       End time.
duration = end_time - start_time? 

Original comment by tyanko.a...@gmail.com on 2 Feb 2012 at 10:36

GoogleCodeExporter commented 9 years ago
| I have only one questions about the duration:
| [...]
| The following attributes are listed in accounting:
|
| submission_time
|       Submission time.
|
| start_time
|       Start time.
|
| end_time
|       End time.
| duration = end_time - start_time?

Yes, exactly.

Original comment by riccardo.murri@gmail.com on 2 Feb 2012 at 10:49

GoogleCodeExporter commented 9 years ago
Accounting information is now put in CSV format by distil.py script. Updated 
version in SVN. 

Original comment by tyanko.a...@gmail.com on 2 Feb 2012 at 11:58

GoogleCodeExporter commented 9 years ago

Original comment by tyanko.a...@gmail.com on 7 Feb 2012 at 5:05

GoogleCodeExporter commented 9 years ago

Original comment by tyanko.a...@gmail.com on 21 Feb 2012 at 9:22