open-horizon / edge-sync-service

Cloud - Edge synchronization service (MMS)
Apache License 2.0
24 stars 26 forks source link

504 Gateway timeout when updating model used by 1,000's of agents #77

Closed dlarson04 closed 3 years ago

dlarson04 commented 3 years ago
root@plasmas1:~/hzn-work/mms/policy# ./zz.sh
hzn -v  mms object publish -m my-mms.json  -f  purdue-hello-mms-config.json
[verbose] Reading configuration file: /etc/horizon/hzn.json
[verbose] Reading configuration file: /etc/default/horizon
[verbose] Config file does not exist: /root/.hzn/hzn.json.
[verbose] No project level configuration file found.
[verbose] Config file does not exist: /root/hzn-work/mms/policy/hzn.json.
Digital sign with SHA1 will be performed for data integrity. It will delay the MMS object publish.
Start hashing the file...
Data hash is generated. Start digital signing with the data hash...
Digital sign finished.
[verbose] The model management service url: https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css
[verbose] PUT https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css/api/v1/objects/ieam-roks-scale/application-mmsmodel/purdue-hello-mms-config.json
[verbose] HTTP request timeout set to 30 seconds
[verbose] HTTP code: 204
[verbose] The model management service url: https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css
[verbose] PUT https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css/api/v1/objects/ieam-roks-scale/application-mmsmodel/purdue-hello-mms-config.json/data
[verbose] HTTP request timeout set to 0 seconds
[verbose] Encountered HTTP error: <nil> calling Model Management Service REST API PUT https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css/api/v1/objects/ieam-roks-scale/application-mmsmodel/purdue-hello-mms-config.json/data. HTTP status: 504 Gateway Time-out. Will retry.
[verbose] Encountered HTTP error: <nil> calling Model Management Service REST API PUT https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css/api/v1/objects/ieam-roks-scale/application-mmsmodel/purdue-hello-mms-config.json/data. HTTP status: 504 Gateway Time-out. Will retry.
[verbose] Encountered HTTP error: <nil> calling Model Management Service REST API PUT https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css/api/v1/objects/ieam-roks-scale/application-mmsmodel/purdue-hello-mms-config.json/data. HTTP status: 504 Gateway Time-out. Will retry.
[verbose] Encountered HTTP error: <nil> calling Model Management Service REST API PUT https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css/api/v1/objects/ieam-roks-scale/application-mmsmodel/purdue-hello-mms-config.json/data. HTTP status: 504 Gateway Time-out. Will retry.
[verbose] Encountered HTTP error: <nil> calling Model Management Service REST API PUT https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css/api/v1/objects/ieam-roks-scale/application-mmsmodel/purdue-hello-mms-config.json/data. HTTP status: 504 Gateway Time-out. Will retry.
Error: Encountered HTTP error: <nil> calling Model Management Service REST API PUT https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css/api/v1/objects/ieam-roks-scale/application-mmsmodel/purdue-hello-mms-config.json/data. HTTP status: 504 Gateway Time-out.

The reason seems to be that CSS receives the model update in plenty of time but then on the same thread handling the HTTP request, it does some type of notification on all 2994 agents which takes 4 minutes causing the timeout on the HTTP connection.

Aug 3 08:33:42 ibm-edge-css-55bf84f5bb-fhgj5 ibm-edge-css DEBUG * CSS: 2021/08/03 12:33:42 DEBUG: In PutObjectData. Update data application-mmsmodel purdue-hello-mms-config.json
Aug 3 08:33:43 ibm-edge-css-55bf84f5bb-fhgj5 ibm-edge-css DEBUG * CSS: 2021/08/03 12:33:43 DEBUG: In PutObjectData. Start data verification application-mmsmodel purdue-hello-mms-config.json
Aug 3 08:33:43 ibm-edge-css-55bf84f5bb-fhgj5 ibm-edge-css DEBUG * CSS: 2021/08/03 12:33:43 DEBUG: In PutObjectData. data verified for object application-mmsmodel purdue-hello-mms-config.json
Aug 3 08:33:48 ibm-edge-css-55bf84f5bb-fhgj5 ibm-edge-css DEBUG * CSS: 2021/08/03 12:33:48 DEBUG: In PutObjectData. done with storing data for object application-mmsmodel purdue-hello-mms-config.json
Aug 3 08:33:48 ibm-edge-css-55bf84f5bb-fhgj5 ibm-edge-css DEBUG * CSS: 2021/08/03 12:33:48 DEBUG: In PutObjectData. Size of notificationsInfo is 2994
Aug 3 08:33:48 ibm-edge-css-55bf84f5bb-fhgj5 ibm-edge-css Doug... In SendNotifications....
Aug 3 08:37:22 ibm-edge-css-55bf84f5bb-fhgj5 ibm-edge-css Doug... Done SendNotifications....
dlarson04 commented 3 years ago

I think this happens for the same reason when the model is in use by many agents it is deleted

root@plasmas1:~/hzn-work/mms/policy# hzn -v mms object delete --type application-mmsmodel --id purdue-hello-mms-config.json
[verbose] Reading configuration file: /etc/horizon/hzn.json
[verbose] Reading configuration file: /etc/default/horizon
[verbose] Config file does not exist: /root/.hzn/hzn.json.
[verbose] No project level configuration file found.
[verbose] The model management service url: https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css
[verbose] DELETE https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css/api/v1/objects/ieam-roks-scale/application-mmsmodel/purdue-hello-mms-config.json
[verbose] HTTP request timeout set to 30 seconds
[verbose] Encountered HTTP error: Delete "https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css/api/v1/objects/ieam-roks-scale/application-mmsmodel/purdue-hello-mms-config.json": net/http: timeout awaiting response headers calling Model Management Service REST API DELETE https://cp-console.ieam-roks-scale-70ea81cdef68a2eb78ece6d890b7dad3-0000.us-south.containers.appdomain.cloud/edge-css/api/v1/objects/ieam-roks-scale/application-mmsmodel/purdue-hello-mms-config.json. HTTP status: . Will retry.
[verbose] HTTP code: 204
Object purdue-hello-mms-config.json deleted from org ieam-roks-scale in the Model Management Service