openBackhaul / MicroWaveDeviceInventory

Physical and logical inventory of the MW SDN Domain
Apache License 2.0
5 stars 1 forks source link

Container was exited with "maximum call stack size exceeded" error #888

Open Ziabatcha opened 6 months ago

Ziabatcha commented 6 months ago

Every day container was exited after 24 hours of successful run. In container logs, observed that container is exited with "maximum call stack size exceeded" error

MWDI_Test_v1 1 3

nardecchia-siae commented 6 months ago

To be tested in the tag test_v1_1_4

Ziabatcha commented 5 months ago

Tested in test_v1.1.4, still seeing the issue.

When too many device alarm is received in the MWDI, first few request (example 1000) device alarms are accepted by MWDI with 204 response code. After that we are observing 401 Error bad request for all other device alarm request.

Logs:

POST /v1/regard-device-alarm 204 35199.029 ms - -
POST /v1/regard-device-alarm 204 35239.323 ms - -
[02.04.2024 10:01.17.652] [LOG]   
 callback : http://X.X.X.X:XXXX/rests/data/network-topology:network-topology/topology=topology-netconf/node=A/yang-ext:mount/core-model-1-4:control-construct/alarms-1-0:alarm-pac/current-alarms header :{"Authorization":"Authorization"}body :""response code:200
[02.04.2024 10:01.17.661] [LOG]   
 callback : http://X.X.X.X:XXXX/rests/data/network-topology:network-topology/topology=topology-netconf/node=B/yang-ext:mount/core-model-1-4:control-construct/alarms-1-0:alarm-pac/current-alarms header :{"Authorization":"Authorization"}body :""response code:200
[02.04.2024 10:01.17.685] [LOG]   undefined
[02.04.2024 10:01.17.768] [LOG]   undefined
[02.04.2024 10:01.17.863] [LOG]   
 callback : http://X.X.X.X:XXXX/rests/data/network-topology:network-topology/topology=topology-netconf/node=C/yang-ext:mount/core-model-1-4:control-construct/alarms-1-0:alarm-pac/current-alarms header :{"Authorization":"Authorization"}body :""response code:200
[02.04.2024 10:01.17.946] [LOG]   undefined
[02.04.2024 10:01.18.107] [LOG]   
 callback : http://X.X.X.X:XXXX/rests/data/network-topology:network-topology/topology=topology-netconf/node=D/yang-ext:mount/core-model-1-4:control-construct header :{"Authorization":"Authorization"}body :""response code:200
[02.04.2024 10:01.18.307] [LOG]   
 callback : http://X.X.X.X:XXXX/rests/data/network-topology:network-topology/topology=topology-netconf/node=E/yang-ext:mount/core-model-1-4:control-construct/alarms-1-0:alarm-pac/current-alarms header :{"Authorization":"Authorization"}body :""response code:200
[02.04.2024 10:01.18.316] [LOG]   
 callback : http://X.X.X.X:XXXX/rests/data/network-topology:network-topology/topology=topology-netconf/node=F/yang-ext:mount/core-model-1-4:control-construct/alarms-1-0:alarm-pac/current-alarms header :{"Authorization":"Authorization"}body :""response code:200
[02.04.2024 10:01.18.317] [LOG]   
 callback : http://X.X.X.X:XXXX/rests/data/network-topology:network-topology/topology=topology-netconf/node=G/yang-ext:mount/core-model-1-4:control-construct/alarms-1-0:alarm-pac/current-alarms header :{"Authorization":"Authorization"}body :""response code:200
[02.04.2024 10:01.18.345] [LOG]   undefined
[02.04.2024 10:01.18.515] [LOG]   undefined
[02.04.2024 10:01.18.622] [LOG]   undefined
[02.04.2024 10:01.18.934] [LOG]   loggingErrorHandler - caught error, returning response with status code "401" and body {"code":401,"message":"Too many pending tasks in queue ./database/load.json","errors":[{"path":"/v1/regard-device-alarm","message":"Too many pending tasks in queue ./database/load.json"}]}
[02.04.2024 10:01.18.935] [LOG]   recordServiceRequest - record service request with body {} failed with error: Too many pending tasks in queue ./database/load.json
POST /v1/regard-device-alarm 401 9.140 ms - 230
[02.04.2024 10:01.18.946] [LOG]   Error: Too many pending tasks in queue ./database/load.json
    at AsyncLock.acquire (/home/openbackhaul/microWaveDeviceInventory/node_modules/async-lock/lib/index.js:195:15)
    at Object.exports.readFromDatabaseAsync (/home/openbackhaul/microWaveDeviceInventory/node_modules/onf-core-model-ap/applicationPattern/databaseDriver/JSONDriver.js:27:23)
    at Function.getLogicalTerminationPointListAsync (/home/openbackhaul/microWaveDeviceInventory/node_modules/onf-core-model-ap/applicationPattern/onfModel/models/ControlConstruct.js:32:59)
    at Function.getOperationServerUuidAsync (/home/openbackhaul/microWaveDeviceInventory/node_modules/onf-core-model-ap/applicationPattern/onfModel/models/layerProtocols/OperationServerInterface.js:197:13)
    at Function.createResponseHeader (/home/openbackhaul/microWaveDeviceInventory/node_modules/onf-core-model-ap/applicationPattern/rest/server/ResponseHeader.js:64:70)
    at /home/openbackhaul/microWaveDeviceInventory/controllers/IndividualServices.js:2817:51
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
[02.04.2024 10:01.18.947] [LOG]   recordServiceRequest - record service request with body {} failed with error: Too many pending tasks in queue ./database/load.json
POST /v1/regard-device-alarm 204 32083.800 ms - -
[02.04.2024 10:01.18.949] [LOG]   
 callback : http://X.X.X.X:XXXX/rests/data/network-topology:network-topology/topology=topology-netconf/node=H/yang-ext:mount/core-model-1-4:control-construct/alarms-1-0:alarm-pac/current-alarms header :{"Authorization":"Authorization"}body :""response code:200
[02.04.2024 10:01.18.990] [LOG]   loggingErrorHandler - caught error, returning response with status code "401" and body {"code":401,"message":"Too many pending tasks in queue ./database/load.json","errors":[{"path":"/v1/regard-device-alarm","message":"Too many pending tasks in queue ./database/load.json"}]}
[02.04.2024 10:01.18.990] [LOG]   recordServiceRequest - record service request with body {} failed with error: Too many pending tasks in queue ./database/load.json
POST /v1/regard-device-alarm 401 9.214 ms - 230
[02.04.2024 10:01.19.022] [LOG]   undefined
[02.04.2024 10:01.19.089] [LOG]   loggingErrorHandler - caught error, returning response with status code "401" and body {"code":401,"message":"Too many pending tasks in queue ./database/load.json","errors":[{"path":"/v1/regard-device-alarm","message":"Too many pending tasks in queue ./database/load.json"}]}
[02.04.2024 10:01.19.089] [LOG]   recordServiceRequest - record service request with body {} failed with error: Too many pending tasks in queue ./database/load.json
nardecchia-siae commented 5 months ago

From the logs it seems that all the Too many pending tasks in queue ./database/load.json errors are related to the logging requests to the EATL. The code of the api is optimized to not access at all to the file while the controller tries to retrieve the IP address and port of EATL everytime an alarm is served by the api.

kmohr-soprasteria commented 5 months ago

Issue is caused by current implementation of alarm notification processing: when a alarm notification is received, the MWDI sends a request to the controller to retrieve the latest alarm list from the respective device.

Another possible approach would be to directly write the information from the notitication to the cache (if notifications are lost, the alarm lists in the MWDI are not up-to-date).

The further proceeding depends on feedback already requested from Andreas Haack (backoffice).

kmohr-soprasteria commented 5 months ago

Decision: the current approach shall be removed from the implementation. The alarm list shall just be updated according to information available from the notification, instead of retrieving the alarm list from the device after each received notification.

Ziabatcha commented 5 months ago

Tested in version:1.1.0.a_impl

Still observing issue. Only for 175 alarms, able to receive the notifications successfully. When alarm is received more than 200 notification, observing below error [Above scenario is tested with mock app]

Logs:

[15.04.2024 06:50.56.205] [LOG]
 callback : POST http://X.X.X.X:XXXX/v1/record-service-request header :{"user":"LoadStress","originator":"MicroWaveDeviceInventory","x-correlator":"99dac7bd-2D22-0EF2-ddea-cbCBD4801F9F","trace-indicator":1,"customer-journey":"unknown","operation-key":"-----","Content-Type":"application/json"}body :{"x-correlator":"dd012349-bbc5-E417-9Dad-fffd703E1ccc","trace-indicator":"1","user":"LoadStress","originator":"NotificationProxy","application-name":"MicroWaveDeviceInventory","release-number":"1.1.0","operation-name":"/v1/regard-device-alarm","response-code":500}response code:204
[15.04.2024 06:50.56.207] [LOG]
 callback : POST http://X.X.X.X:XXXX/v1/record-service-request header :{"user":"LoadStress","originator":"MicroWaveDeviceInventory","x-correlator":"9EEc4Ffc-ef93-A317-A3ac-bFde5Da4Df60","trace-indicator":1,"customer-journey":"unknown","operation-key":"-----","Content-Type":"application/json"}body :{"x-correlator":"dd012349-bbc5-E417-9Dad-fffd703E1ccc","trace-indicator":"1","user":"LoadStress","originator":"NotificationProxy","application-name":"MicroWaveDeviceInventory","release-number":"1.1.0","operation-name":"/v1/regard-device-alarm","response-code":500}response code:204
nardecchia-siae commented 5 months ago

Problem present also in the last tag seems to be related to a massive request to trace the result into EATL. I proposed to test a version of MWDI without the tracing functionality, but in this case the result is that Elasticsearch goes in timeout maybe due to an high rate of access to store or modify current alarms.

I suggest to try another test removing all interactions with ES to verify id MWDI itself is capable to handle such rate of notification.