stratum / fabric-tna

The SD-Fabric data plane
https://docs.sd-fabric.org/
30 stars 15 forks source link

Prevent the use of stale gRPC client references #482

Closed pierventre closed 2 years ago

pierventre commented 2 years ago

Sometime, we have found that UP4 commands fails in the ONOS CLI due to the following exception:

ERROR [ShellUtil] Exception caught while executing command java.lang.IllegalStateException: Context is cancelled (client has been shut down) at org.onosproject.grpc.ctl.AbstractGrpcClient.runInCancellableContext(AbstractGrpcClient.java:110) ~[?:?] at org.onosproject.p4runtime.ctl.client.P4RuntimeClientImpl.execRpc(P4RuntimeClientImpl.java:265) ~[?:?] at org.onosproject.p4runtime.ctl.client.ReadRequestImpl.submit(ReadRequestImpl.java:319) ~[?:?] at org.onosproject.p4runtime.ctl.client.ReadRequestImpl.submitSync(ReadRequestImpl.java:325) ~[?:?] at org.stratumproject.fabric.tna.behaviour.upf.FabricUpfProgrammable.readCounter(FabricUpfProgrammable.java:571)

This can easily happen as the gRPC client reference is initialized only the first time setupBehavior is called on any method of FabricUpfProgrammable. Unless the gRPC client is really broken and not available, this PR should fix the stale reference issue. Note that the skip behavior was introduced long time ago to fix a regression #289. However, this partial revert should not have any impact on the performance since the expensive part was the compute of the hw resources.

codecov[bot] commented 2 years ago

Codecov Report

Merging #482 (e294078) into main (639a6b7) will not change coverage. The diff coverage is 0.00%.

:exclamation: Current head e294078 differs from pull request most recent head 5895255. Consider uploading reports for the commit 5895255 to get more accurate results

Impacted file tree graph

@@            Coverage Diff            @@
##               main     #482   +/-   ##
=========================================
  Coverage     69.90%   69.90%           
  Complexity      722      722           
=========================================
  Files            63       63           
  Lines          4748     4748           
  Branches        523      523           
=========================================
  Hits           3319     3319           
  Misses         1157     1157           
  Partials        272      272           
Impacted Files Coverage Δ
...abric/tna/behaviour/upf/FabricUpfProgrammable.java 60.49% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 639a6b7...5895255. Read the comment docs.

ccascone commented 2 years ago

Most secretive PR ever... ;)

pierventre commented 2 years ago

Will update later the description - it is a quick and dirty fix :)

ccascone commented 2 years ago

Doesn't look so ugly, do we want to merge this?

pierventre commented 2 years ago

@daniele-moro are you fine to merge it after CI pass ? Or do you want to give another shot ?

daniele-moro commented 2 years ago

@daniele-moro are you fine to merge it after CI pass ? Or do you want to give another shot ?

I'm ok in merging this after CI pass.