Open mhanif opened 2 years ago
Hi @hanif, this is similar to the https://github.com/Azure/DASH/issues/186 one you closed yesterday. The symptoms are similar (except this time you also have lsof: no pwd entry for UID 4321
which must be related to recently merged https://github.com/Azure/DASH/pull/202>
Like last time, I see GRPC ERROR[7]: Not primary, GRPC call Write::INSERT ERROR:
which I've seen in the past when there is already a P4Runtime client owning the session to the switch. Can you confirm no extraneous docker containers running which might have attached to the bmv2 switch. Please paste here the output of docker ps -a
, thanks
Hi @chrispsommers, thanks a lot for looking in to this. Below is a requested output:
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
940c99d8af69 chrissommers/dash-saithrift-bldr:220819 "./saiserver" 2 hours ago Up 2 hours dash-saithrift-server-mhanif
eeb81e8cff56 chrissommers/dash-bmv2-bldr:220819 "env LD_LIBRARY_PATH…" 2 hours ago Up 2 hours simple_switch-mhanif
058107d1ad3e hello-world "/hello" 5 months ago Exited (0) 5 months ago agitated_sinoussi
14ba71aec719 nf:latest "tail -f /dev/null" 10 months ago Exited (137) 10 months ago fw
beb0503f1b82 endpoint:latest "tail -f /dev/null" 10 months ago Exited (137) 10 months ago ext
1bb2dabe1b0f endpoint:latest "tail -f /dev/null" 10 months ago Exited (137) 10 months ago int
36465ac7f57f endpoint:latest "tail -f /dev/null" 10 months ago Exited (137) 10 months ago h1
9acb562fb0f7 8425a5f345b8 "/bin/sh -c 'apt-get…" 10 months ago Exited (1) 10 months ago determined_jackson
16377b7c6997 ubuntu:trusty "sh" 10 months ago Exited (0) 10 months ago affectionate_lovelace
Looks normal too me. I assume you executed the customary three commands in three consoles (make run-switch, make run-saithrift-server, make run-all tests?). Does this happen every time?
Yes, I ran 3 commands in 3 separate console. Other behave as expected but run-all-tests doesn't . I tried couple of times after calling kill-all and re-running them and I get the same results. Thanks!
Looks like it happened once in the CI pipeline, for the first time ever AFAIK: https://github.com/Azure/DASH/actions/runs/3075002621/jobs/4968124219#step:13:24
It passed on a subsequent re-try. I suspect a race condition when this line is executed. We look for a listener on the P4Runtime server socket, but there may be a delay before the server is actually "ready." Could you try inserting a sleep(3) or similar before this step and see if it makes your test runs succeed? Another experiment is to run all the steps for make run-all-tests
manually, i.e. init-switch
and so forth. It would also reinforce the theory that's it's a timing issue and will vary based on CPU speed, environment, etc.
https://github.com/Azure/DASH/blob/17912b53472433b9bfef4d04651d8d816571c0e8/dash-pipeline/Makefile#L280
On a recent (today) clean clone from "main", make run-all-tests fails with the following error - Not sure what I am doing wrong: