open-horizon / anax

Horizon agent control system
https://open-horizon.github.io/docs/anax/docs/
Apache License 2.0
72 stars 98 forks source link

Bug: node doesn't get proposal after change node policy on the exchange #3938

Open LiilyZhang opened 10 months ago

LiilyZhang commented 10 months ago

This is an intermittent issue.
When update node policy in the exchange, the node policy will match a deployment policy. And node receives proposal before receives node policy change from exchange, the error occurs

node receive proposal, but node doesn't receive updated node policy from exchange yet. Node will have error when handling proposal:

"2023-10-25 18:13:18:   Node received Proposal message using agreement c3a7394da46b3e4ade7450bb9670def717c68d52875ffb38d1a8b16c745f798d for service e2edev@somecomp.com/my.company.com.services.usehello2 from the agbot IBM/agbot.",
"2023-10-25 18:13:19:   Error handling proposal for service e2edev@somecomp.com/my.company.com.services.usehello2. Error: Respond to proposal with error: Protocol Basic error verifying merged policy Name: Policy for userdev/an12345 Version: 2.0, Pattern: \nAPI Specifications\nAgreement Protocol: []\nWorkloads:\nProperties:\nName: openhorizon.hardwareId Value: 3fe3197a9bb331d024fcb1341d433ee78d9cea29\nName: openhorizon.operatingSystem Value: ubuntu\nName: openhorizon.containerized Value: false\nName: openhorizon.cpu Value: 4\nName: openhorizon.arch Value: amd64\nName: openhorizon.memory Value: 32094\nName: openhorizon.allowPrivileged Value: false\nName: purpose Value: network-testing\nName: group Value: bluenode\nConstraints: [iame2edev == true NONS==true || NOGPS == true || NOLOC == true || NOPWS == true || NOHELLO == false || NOK8S == false]\nData Verification: Enabled: false, URL: , URL User: , Interval: 0, CheckRate: 0, Metering: Tokens: 0, PerTimeUnits: , Notification Interval: 0\nNode Health: {0 0}\nSecretBinding: []\nClusterNamespace: \n and Name: Policy for userdev/an12345 Version: 2.0, Pattern: \nAPI Specifications\nAgreement Protocol: []\nWorkloads:\nProperties:\nName: openhorizon.hardwareId Value: 3fe3197a9bb331d024fcb1341d433ee78d9cea29\nName: openhorizon.operatingSystem Value: ubuntu\nName: openhorizon.containerized Value: false\nName: openhorizon.cpu Value: 4\nName: openhorizon.arch Value: amd64\nName: openhorizon.memory Value: 32094\nName: openhorizon.allowPrivileged Value: false\nName: purpose Value: network-testing2\nName: group Value: bluenode\nConstraints: [iame2edev == true NONS==true || NOGPS == true || NOLOC == true || NOPWS == true || NOHELLO == false || NOK8S == false]\nData Verification: Enabled: false, URL: , URL User: , Interval: 0, CheckRate: 0, Metering: Tokens: 0, PerTimeUnits: , Notification Interval: 0\nNode Health: {0 0}\nSecretBinding: []\nClusterNamespace: \n, error: Compatibility Error: Common Properties between [{openhorizon.hardwareId 3fe3197a9bb331d024fcb1341d433ee78d9cea29 } {openhorizon.operatingSystem ubuntu } {openhorizon.containerized false } {openhorizon.cpu 4 } {openhorizon.arch amd64 } {openhorizon.memory 32094 } {openhorizon.allowPrivileged false } {purpose network-testing } {group bluenode }] and [{openhorizon.hardwareId 3fe3197a9bb331d024fcb1341d433ee78d9cea29 } {openhorizon.operatingSystem ubuntu } {openhorizon.containerized false } {openhorizon.cpu 4 } {openhorizon.arch amd64 } {openhorizon.memory 32094 } {openhorizon.allowPrivileged false } {purpose network-testing2 } {group bluenode }]. Underlying error: Property purpose has value network-testing and network-testing2.",

After that, node never get proposal again.

Expected: node get new proposal later, once node policy on the agent side is sync up with node policy in the exchange, the agreement should be formed

mustafamg commented 9 months ago

I believe this task is related to this on #3936