radical-cybertools / radical.analytics

Analytics for RADICAL-Cybertools
Other
1 stars 1 forks source link

Sum amount of concurrent pilots per time sample #31

Open mturilli opened 7 years ago

mturilli commented 7 years ago

When measuring pilot concurrency, sampling returns all the timestamps within the sampling rate. For example:

session.concurrency(state=['PMGR_ACTIVE', ['DONE', 'CANCELED', 'FAILED']], sampling=60)
    timestamp   npilot
0   3.9303   1
1   4.4225   1
2   4.8275   1
3   5.3653   1
4   5.5524   1
5   5.6212   1
6   5.8381   1
7   6.0298   1
8   6.1350   1
9   6.7833   1
10  7.8033   1
11  63.9303  1
12  64.4225  1
13  64.8275  1
14  65.3653  2
15  65.5524  1
16  65.6212  1
17  65.8381  2
18  66.0298  2
19  66.1350  1
20  66.7833  2
21  67.8033  2
22  123.9303 1

When plotting this timeseries, it would be useful to have the total amount of entities in the specified state for the chosen time sample. For example:

1    60     11
2    120    16
...
andre-merzky commented 7 years ago

Hmm, this is actually what the sampling should do already, so this is a bug. thx!

mturilli commented 7 years ago

As a side note, this is done by the resample method in pandas:

>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8

>>> series.resample('3T', label='right', closed='right').sum()
2000-01-01 00:00:00     0
2000-01-01 00:03:00     6
2000-01-01 00:06:00    15
2000-01-01 00:09:00    15
andre-merzky commented 7 years ago

Sorry that this took so long to look into! To me it looks like the sample parameter to the session.concurrency() method does exactly what you are asking for. Like, I get reasonable output with different sample values for code like this:

    units = session.filter(etype='unit', inplace=False)
    conc = units.concurrency(state=[rp.AGENT_EXECUTING,
                                    rp.AGENT_STAGING_OUTPUT_PENDING], 
                                    sampling=1.0)   # sampling time in seconds

Is that what you mean?