tbarbette / npf

Network Performance Framework: easy-to-use experiment manager with automated testing, result collection, and graphing
GNU General Public License v3.0
38 stars 15 forks source link

Exception when plotting after experiment #53

Closed cdelzotti closed 1 week ago

cdelzotti commented 3 months ago

I'm currently trying to run this command :

python3 /usr/local/bin/npf-run.py local --test ./script.npf --graph-filename ./results/graph.pdf --variables LIMIT_TIME=14 --cluster client=$CLIENT server=$SERVER --graph-size 12 10 --single-output ./results/out.csv --cluster-autosave --result-path ./results/

With the following variables in script.npf :

%variables
CPU=[1-1]
WL=1000
FREQ={1000,2000,3000,3900}
TIME=10

GEN_THREADS=8
GEN_BURST=8
GEN_RX_THREADS=8
GEN_FLOWSIZE=32

SLEEP_MODE={sleep_mode1,sleep_mode2,sleep_mode3}
SLEEP_DELTA={1,2,4,8,16,32}
GEN_RATE={4000000}
BURST_SIZE={32}
GEN_LENGTH=64
MINFREQ=1000
LIMIT=1000000000

FASTCLICK_PATH=/root/fastclick_sleep_modes/

%config
n_runs=5
results_expect={WATT,RAM,THROUGHPUT,PPS}
graph_filter_by={WATT:DROPPEDPC>0.01,MAXWATT:DROPPEDPC>0.01,LAT99:DROPPEDPC>0.01}
accept_zero={BEGIN_POLL,BEGIN_C1,BEGIN_C1E,BEGIN_C6,END_POLL,END_C1,END_C1E,END_C6}
...
[Experiment descriptions]
...

%import@client fastclick-play-single-mt

While the (in my case very long) testing happens without any issue, npf crashes afterwards during what seems to be the plotting phase. This is the python stack that appears :

ERROR: When trying to export serie local:
Traceback (most recent call last):
  File "/usr/local/bin/npf-run.py", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/npf_run.py", line 262, in main
    grapher.graph(series=[(test, build, all_results)] + g_series,
  File "/usr/local/lib/python3.10/dist-packages/npf/grapher.py", line 706, in graph
    raise(e)
  File "/usr/local/lib/python3.10/dist-packages/npf/grapher.py", line 703, in graph
    all_results_df = pd.concat([all_results_df,x_df],ignore_index = True, axis=0)
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/concat.py", line 393, in concat
    return op.get_result()
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/concat.py", line 678, in get_result
    indexers[ax] = obj_labels.get_indexer(new_labels)
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py", line 3882, in get_indexer
    raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Is there a solution to that problem ? :smiley:

tbarbette commented 3 months ago

This is in the CSV export. Could you printf at line 703 all_results_df and x_df ? It's hard to judge the problem without the shape of your experiment.

cdelzotti commented 3 months ago

Here is a CSV dump of these two variables before it crashed :

all_results_df

,build,test_index,Method,TAG,CPU,WL,TIME,GEN_THREADS,GEN_BURST,GEN_RX_THREADS,GEN_FLOWSIZE,SLEEP_MODE,SLEEP_DELTA,BURST_SIZE,GEN_LENGTH,FREQ,LIMIT,RATE_PER_CORE,GEN_RATE,FIRST_CPU,NUMA_NODE,FASTCLICK_PATH,NPF_TESTIE_PATH,PCI,LIMIT_TIME,LATENCY,LAT00,LAT01,LAT50,LAT95,LAT99,LAT999,LAT100,TESTTIME,RCVTIME,THROUGHPUT,COUNT,BYTES,SENT,DROPPED,DROPPEDPC,TX,TXPPS,PPS,WATT,RAM,POLL_TIME,C1_TIME,C1E_TIME,C6_TIME,TOTAL_CIDLE,run_index
0,local,0,IPFilterDenyPMP,"""test_throughput""",2,1000,10,8,8,8,32,no_sleep,1,32,64,3000,1000000000,4000000,4000000,0,0,/root/fastclick_sleep_modes/,/root/power/,0000:18:00.1,14,750.947289885,20.5,326.125,801.0,832.875,869.5,1540.25,8505.75,14.0131726265,17.033125,3558462711.65,86096460.0,5510174528.0,136424360.0,50327900.0,0.368906989925,5527071021.38,7850961.90256,5054666.46973,44.734199,7.73698,0.0,0.0,0.0,0.0,0.0,0

x_df

,build,test_index,Method,TAG,CPU,WL,TIME,GEN_THREADS,GEN_BURST,GEN_RX_THREADS,GEN_FLOWSIZE,SLEEP_MODE,SLEEP_DELTA,BURST_SIZE,GEN_LENGTH,FREQ,LIMIT,RATE_PER_CORE,GEN_RATE,FIRST_CPU,NUMA_NODE,FASTCLICK_PATH,NPF_TESTIE_PATH,PCI,LIMIT_TIME,LATENCY,LAT00,LAT01,LAT50,LAT95,LAT99,LAT999,LAT100,TESTTIME,RCVTIME,THROUGHPUT,COUNT,BYTES,SENT,DROPPED,DROPPEDPC,TX,TXPPS,PPS,SLEEP_MODE,WATT,RAM,POLL_TIME,C1_TIME,C1E_TIME,C6_TIME,TOTAL_CIDLE,run_index
0,local,1,IPFilterDenyPMP,"""test_throughput""",2,1000,10,8,8,8,32,hr_plus,1,32,64,3000,1000000000,4000000,4000000,0,0,/root/fastclick_sleep_modes/,/root/power/,0000:18:00.1,14,743.056567814,29.125,298.5,796.875,832.625,882.625,1543.5,6543.875,14.0132513046,16.93525,3562380694.47,85696774.0,5484595584.0,135642680.0,49945906.0,0.368216744169,5526335457.01,7849926.83813,5060191.01004,5.0,44.508492,7.715913,56.0,134.0,0.0,0.0,190.0,0
cdelzotti commented 3 months ago

The problem might be related to the fact that there are two columns called 'SLEEP_MODE' in x_df

cdelzotti commented 3 months ago

Update : Turned out a dependency I used actually happened to print a 'RESULT-SLEEP_MODE' without warning, therefore causing a duplicated column. It could be interesting to have a clearer message in npf indicating that a particular column is present multiple times :smiley:

tbarbette commented 3 months ago

Normally duplicated results get mixed. By default there will be two results per run, which should be fine. You can also add result_add={SLEEP_MODE} in the config so the resutls are additionned instead of duplicated.

Maybe the problem comes from SLEEP_MODE not being numeric? Or one of them is empty?

cdelzotti commented 3 months ago

SLEEP_MODE was actually both a numeric value and a label because of my dependency messing with my script. But it works like a charm now.

tbarbette commented 3 months ago

Can you help me reproduce the problem ? python3 npf-run.py --test integration/warnmultiple.npf --csv local.csv --force-retest

integration/warnmultiple.npf:

%variables
TEST={A,B,C}

%script
echo "RESULT-TEST 78"
echo "RESULT-TEST MODE

Definitely there is multiple warnings that should happen and I'll work on that, but I don't have a crash.

Result is :

index,build,test_index,TEST,TEST,run_index
0,local,0,A,78.0,0
1,local,0,A,78.0,1
2,local,0,A,78.0,2
3,local,1,B,78.0,0
4,local,1,B,78.0,1
5,local,1,B,78.0,2
6,local,2,C,78.0,0
7,local,2,C,78.0,1
8,local,2,C,78.0,2

Warnings to add :

tbarbette commented 1 week ago

Fixed in last push. There are now warnings to prevent this situation.