opencomputeproject / SAI-Challenger

Other
17 stars 35 forks source link

process command return result not evaluated on new python version #61

Closed vincent-xs closed 1 year ago

vincent-xs commented 1 year ago

The DASH community repo is currently picking up SAI challenger on 4afa76c55ccb466a7a21d6fac733a0e0eb1908db . Depending on which machine I build and run testcases over the SAI challenger, I get different behavior regarding the returned result from the SAI challenger. (sai.py : def process_command)

If I execute a test from a git repo compiled on an ubuntu machine (python version 2.7.17) , the result returned by the SAI challenger is 0 (aka value of SAI_STATUS_SUCCESS) If I execute the same test but this time, the same repo is compiled on a debian machine (python 3.9.2), the result from SAI challenger is 'SAI_STATUS_SUCESS'

aka, SAI_STATUS_SUCCESS was not �evaluated�.

any idea what causes this?

andriy-kokhan commented 1 year ago

SAI Challenger is intended to be executed from Docker container that is based on buster-slim - https://github.com/opencomputeproject/SAI-Challenger/blob/4afa76c55ccb466a7a21d6fac733a0e0eb1908db/Dockerfile.client#L1. So, I'm assuming there must be something else. Please provide more details - steps to reproduce, logs, etc.

vincent-xs commented 1 year ago

I did more testing to characterize the issue: it's not related to how we build the sai challenger, but where the sai challenger runs (as you mentioned, it runs within a docker). I'll attach files to the issue (build log, testcase, ...)

Assuming a sai thrift server runs somewhere and we do a simple VIP create/remove and we dump the result both create/remove Note that we run exactly the same built client , just on 2 different machines which have different ubuntu versions

Distributor ID: Ubuntu Description: Ubuntu 18.04.6 LTS Release: 18.04 Codename: bionic @.***:/sai-challenger/dash_tests/functional# ./run-tests.sh ================================================================== test session starts ================================================================== platform linux -- Python 3.7.3, pytest-6.0.1, py-1.11.0, pluggy-0.13.1 -- /usr/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.7.3', 'Platform': 'Linux-4.15.0-200-generic-x86_64-with-debian-10.13', 'Packages': {'pytest': '6.0.1', 'pluggy': '0.13.1'}, 'Plugins': {'metadata': '2.0.4', 'html': '3.2.0', 'dependency': '0.5.1'}} rootdir: /sai-challenger/dash_tests/functional, configfile: pytest.ini plugins: metadata-2.0.4, html-3.2.0, dependency-0.5.1 collected 1 item

config_test.py::TestReturnResult::test_setup INIT INFO:root:Initializing SAI DPU... INFO:root:Switch oid oid:0x21000000000000 INFO:root:Default VLAN oid oid:0x0 2023-03-22 16:26:34 [snappi.snappi] [WARNING] Certificate verification is disabled INFO:root:1679502394 Starting connection to controller...

======= SAI setup commands RETURN values ======= [{'switch_id': 'oid:0x21000000000000', 'vip': '221.0.0.2'}]

======= SAI teardown commands RETURN values ======= [0] <<<<<<<<<<<< SAI_STATUS_SUCCESS is evaluated PASSED =================================================================== 1 passed in 0.30s ===================================================================

Distributor ID: Ubuntu Description: Ubuntu 20.04.5 LTS Release: 20.04 Codename: focal

@.***:/sai-challenger/dash_tests/functional# ./run-tests.sh Using packet manipulation module: ptf.packet_scapy ================================================================== test session starts ================================================================== platform linux -- Python 3.7.3, pytest-6.0.1, py-1.11.0, pluggy-0.13.1 -- /usr/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.7.3', 'Platform': 'Linux-5.4.0-144-generic-x86_64-with-debian-10.13', 'Packages': {'pytest': '6.0.1', 'pluggy': '0.13.1'}, 'Plugins': {'html': '3.2.0', 'metadata': '2.0.4', 'dependency': '0.5.1'}} rootdir: /sai-challenger/dash_tests/functional, configfile: pytest.ini plugins: html-3.2.0, metadata-2.0.4, dependency-0.5.1 collected 1 item

config_test.py::TestReturnResult::testsetup INIT INFO:root:Initializing SAI DPU... INFO:root:Switch oid oid:0x21000000000000 INFO:root:Default VLAN oid oid:0x1 2023-03-22 16:26:57 [snappi.snappi] [WARNING] Certificate verification is disabled INFO:root:1679502417 Starting connection to controller... INFO:root:command {'name': 'vpe#1', 'op': 'create', 'type': 'SAI_OBJECT_TYPE_VIP_ENTRY', 'key': {'switch_id': '$SWITCH_ID', 'vip': '221.0.0.2'}, 'attributes': ['SAI_VIP_ENTRY_ATTR_ACTION', 'SAI_VIP_ENTRY_ACTION_ACCEPT']}

======= SAI setup commands RETURN values ======= [{'switchid': 'oid:0x21000000000000', 'vip': '221.0.0.2'}] INFO:root:command {'name': 'vpe#1', 'op': 'remove', 'type': 'SAI_OBJECT_TYPE_VIP_ENTRY', 'key': {'switch_id': '$SWITCH_ID', 'vip': '221.0.0.2'}, 'attributes': ['SAI_VIP_ENTRY_ATTR_ACTION', 'SAI_VIP_ENTRY_ACTION_ACCEPT']}

======= SAI teardown commands RETURN values ======= ['SAI_STATUS_SUCCESS'] <<<<<<<<<<<<<<< value is not evaluated. PASSED

vincent-xs commented 1 year ago

docker command run

docker run -v /nobackup/vincentl/dash_bos1_saic/submodules/DASH/dash-pipeline/../test/SAI-Challenger:/sai-challenger -v /nobackup/vincentl/dash_bos1_saic/submodules/DASH/dash-pipeline/../test/SAI-Challenger/../test-cases/scale/saic/:/sai-challenger/dash_tests/scale -v /nobackup/vincentl/dash_bos1_saic/submodules/DASH/dash-pipeline/../test/SAI-Challenger/../test-cases/functional/saic/:/sai-challenger/dash_tests/functional -v /nobackup/vincentl/dash_bos1_saic/submodules/DASH/dash-pipeline/../test/SAI-Challenger/../test-cases/utils/:/sai-challenger/dash_tests/utils -v /nobackup/vincentl/dash_bos1_saic/submodules/DASH/dash-pipeline/../:/dash --cap-add=NET_ADMIN --device /dev/net/tun:/dev/net/tun --rm --network=host --name dash-saichallenger-client-vincentl \ -w /sai-challenger/dash_tests \ -it \ local/dash-saichallenger-client:latest \ /bin/bash root@xsl-bos1:/sai-challenger/dash_tests# cd functional/ root@xsl-bos1:/sai-challenger/dash_tests/functional# ./run-tests.sh

vincent-xs commented 1 year ago

test_archive.zip build_saic.log

andriy-kokhan commented 1 year ago

Actually, on this SAI-C version it's expected that on object remove a correct return status should be ['SAI_STATUS_SUCCESS'] but not [0] . Please make sure that you are running the latest SAI-C version in both cases. To do that, you may want to compare the content of the following file inside SAI-C client docker container: /usr/local/lib/python3.7/dist-packages/saichallenger/common/sai_client/sai_thrift_client/sai_thrift_client.py

It should be like this: https://github.com/opencomputeproject/SAI-Challenger/blob/4afa76c55ccb466a7a21d6fac733a0e0eb1908db/common/sai_client/sai_thrift_client/sai_thrift_client.py#L73

Otherwise, you may need to remove SAI-C containers, then remove SA-C client docker image and then rebuilt it from the scratch.

chrispsommers commented 1 year ago

Actually, on this SAI-C version it's expected that on object remove a correct return status should be ['SAI_STATUS_SUCCESS'] but not [0] . Please make sure that you are running the latest SAI-C version in both cases. To do that, you may want to compare the content of the following file inside SAI-C client docker container: /usr/local/lib/python3.7/dist-packages/saichallenger/common/sai_client/sai_thrift_client/sai_thrift_client.py

It should be like this:

https://github.com/opencomputeproject/SAI-Challenger/blob/4afa76c55ccb466a7a21d6fac733a0e0eb1908db/common/sai_client/sai_thrift_client/sai_thrift_client.py#L73

Otherwise, you may need to remove SAI-C containers, then remove SA-C client docker image and then rebuilt it from the scratch.

Hi @andriy-kokhan the current test cases in DASH all assert for result == 0 for remove cases. Has there been a breaking change in sai-c which requires us to update the test cases when sai-c version gets updated?

andriy-kokhan commented 1 year ago

Hi @chrispsommers . Do we run mentioned TCs as a part of CI? Let me double check these TCs locally.

There was a change in common/sai_client/sai_thrift_client/sai_thrift_client.py to make sure CRUD APIs return status in the same format for both Redis and Thrift clients. Note, process_commands() does assert status == 'SAI_STATUS_SUCCESS' for each operation implicitly. No need for additional explicit check in TC. So, I propose to remove all explicit checks. Also, I'll be glad to consider other options which you see more suitable.

chrispsommers commented 1 year ago

Hi Andriy, we do in fact run many sai-c test-cases in CI and for remove ops we assert all result==0 for result in results. So it seems like if we were to upgrade sai-c submodule all these tests would fail. So it seems like an overhaul is needed. Please look at our test cases and advise, thanks. There are not that many tests so we could remove all explicit checks.

vincent-xs commented 1 year ago

how do we explain the fact that both test are running the same python code (at least what we get from the repo).in other word, in both cases, we are pulling the change mentioned above anyway. could the difference in docker image explain this?

chrispsommers commented 1 year ago

Vincent, I'm wondering what the respective Python version are in sai-challenger-bldr, which is used as base image for saichallenger-client.

andriy-kokhan commented 1 year ago

Hi Andriy, we do in fact run many sai-c test-cases in CI and for remove ops we assert all result==0 for result in results. So it seems like if we were to upgrade sai-c submodule all these tests would fail. So it seems like an overhaul is needed. Please look at our test cases and advise, thanks. There are not that many tests so we could remove all explicit checks.

Yeh... will check this and let you know or make a PR with relevant changes.

vincent-xs commented 1 year ago

I noticed the docker sai challenger client image I am running in the "non working" case is 2 weeks old, whereas the "working" case seems to be using a very recent image. so I am updating now my non working machine with the latest client image and will report back

andriy-kokhan commented 1 year ago

how do we explain the fact that both test are running the same python code (at least what we get from the repo).in other word, in both cases, we are pulling the change mentioned above anyway. could the difference in docker image explain this?

The thing is that all SAI-C sources (SAI-C engine) are get installed into the docker image. So, it does not matter what version of SAI-C is in submodule. What really matters is - what's inside Docker image. So, we should make sure we are using the Docker built of the right version of the submodule.

vincent-xs commented 1 year ago

I just confirmed that with an upgraded SAI challenger client image, the test is now working and we are getting a value of 0 back. so I am good now and we can close the issue. it just happened that this machine had never seen a fresh build since I updated the DASH/SAI challenger repo last week.

thanks Andriy/Chris

andriy-kokhan commented 1 year ago

@vincent-xs , @chrispsommers , As I mentioned above, it's expected that on object remove a correct return status should be ['SAI_STATUS_SUCCESS']. We did not see CI issue in DASH community because SAI-C client docker has not been updated. So, even we updated SAI-C submodule, we were still using previous version of SAI-C docker... The PR as follows fixes this: https://github.com/sonic-net/DASH/pull/353/files

chrispsommers commented 1 year ago

Thanks for doing this @andriy-kokhan. I understand why the change had not impacted DASH yet. I really appreciate you going to the trouble of fixing all those tests!