Not seeing full error message when reconnect fails

creasman commented 10 months ago

Please include the following information in your ticket.

mq-mqi-nodejs version(s) that are affected by this issue: version 1.0.5
A small code sample that demonstrates the issue.

(Apologies in advance for a long-winded explanation, but best to say too much than too little)

We have NodeJS clients that use the Put1Promise interface to queue a message. The connection is made using a CCDT URL as follows, where the value of *QM_ANY) is used for the qmgr name. The configuration is a uniform cluster with two qmgrs.

    // MQ connection object
    const mqCNO = new mq.MQCNO();
    mqCNO.ApplName = applName; // Set application name
    mqCNO.Options = (mqCNO.Options as number) | MQC.MQCNO_CLIENT_BINDING | MQC.MQCNO_RECONNECT;
    mqCNO.CCDTUrl = this.mqConfig.ccdtUrl;

    const mqCSP = new mq.MQCSP();
    mqCSP.UserId = this.mqConfig.user;
    mqCSP.Password = this.mqConfig.pwd;
    mqCNO.SecurityParms = mqCSP;

    try {
      const hConn = await mq.ConnxPromise(this.mqConfig.qmgr!, mqCNO);
      this.logger.log(`[mqService] - connect successful`);
      return hConn;
    } catch (err) {
      this.logger.error(`[mqService] - connect failed : ${err}`);
      throw err;
    }

The code to put the message is below, where message is the message buffer.

    const od = new mq.MQOD(); // Object descriptor.
    od.ObjectType = MQC.MQOT_Q;
    od.ObjectName = queueName;

    const mqmd = new mq.MQMD();
    const pmo = new mq.MQPMO();

    try {
      await mq.Put1Promise(hConn, od, mqmd, pmo, message);
      this.logger.log(`[mqService] message successfully queued (${message.length} bytes)`);
    } catch (error) {
      this.logger.warn(`[mqService] error: ${error}`);
      throw error;
    }

The connection is made without any problems at the time the service starts. The service runs fine (days, weeks or longer). When we do a rolling recycle the qmgrs we see the client connections migrate between the active qmgr and then rebalance after both are running, again.

We hit a random failure of the producer client recently that I am investigating. The client was failing to connect. The output from the Put1 method was reporting this error whenever it executed:

PUT1: MQCC = MQCC_FAILED [2] MQRC = MQRC_RECONNECT_FAILED [2548]

My understanding is this error is reached whenever the reconnect effort has timed out. I looked further back in the logs and came across this error line just before the above errors began:

AMQ9795E: The client channel definition could not be retrieved from its URL, error code (16).

The client had been up and running for well over a week at this time with many other puts succeeding. There is additional information that could be included in the above message. For example, something like this:

AMQ9795E

The client channel definition could not be retrieved from its URL, error code (<insert_1>).

Severity 30 : Error

Explanation The client channel definition location was specified as URL <insert_3>, however the file could not be retrieved from this location.
The error returned was (<insert_1>) <insert_4>. The protocol specific response code was (<insert_2>).

Response Ensure that the URL is reachable and if necessary correct the details provided.

Is it possible to have the full details of this (or other) AMQ errors printed out? This is a random occurrence that we have only seen 2-3 times over the past six months, and we are not able to force a recreate at this time. Having the additional values could help debug the issue, in particular the value of <insert_3>.

We do not see any errors in the log of the service which provides the CCDT, and the CCDT content is static once it restarts. Our current thinking is that either the CCDT URL storage has become corrupted, or that an underlying network issue prevented the client from reaching the service at the moment it needed to reconnect.

Thanks

ibmmqmet commented 10 months ago

This package does nothing with AMQ error messages. Anything that might be displayed will be coming from the underlying C client libraries. So I can't affect that from here.

I'd expect there to always be the full detailed error message in the client error logs (usually under /var/mqm/errors or $HOME/IBM/MQ/data/errors). That will include the real HTTP error like 404.

creasman commented 10 months ago

Mark,

 Thanks for the quick response.  Pointing me to the client log file helped.  I did find the same error message occurring in our Test environment -- at the same time.  Each environment has its own CCDT service URL and separate services.  The full explanation is:

EXPLANATION:
The client channel definition location was specified as URL '<<<url here is correct>>>',
however the file could not be retrieved from this location.

The error returned was (16) 'HTTP response code said error'. The protocol
specific response code was (503).

The URL is correct, but the call returned a 503. I verified the service was running during the time of failure, so I suspect network gremlins were at work. The Test environment recovered, but Prod eventually timed out on the reconnect.

The docs at https://colinpaice.blog/2019/03/27/the-ups-and-downs-of-mq-reconnect-how-can-i-tell-if-automatic-reconnect-has-happened/ describe how to register for events using the underlying MQI client. Is event registration exposed via this NodeJS package? I'd like to register an event handler for when the client enters the reconnect sequence.

Along the same lines is it possible to set values in the mqclient.ini file from this NodeJS package? If not, where should I add the file in order for the underlying MQI client to find it? There are a couple of settings I'd like to utilize for reconnect:

MQReconnectTimeout ReconDelay

Thanks, Jim

ibmmqmet commented 10 months ago

Event registration is not exposed through the NodeJS interface. It's on the backlog for potential implementation but I don't have a timescale. It's not a simple job because of the way the different async components have to interact with the NodeJS engine threading model.

Pointing at the ini file uses the usual mechanisms - default paths etc. Or use an environment variable. If you want to control that in the program rather than as part of the app's setup, then process.env['MQCLNTCF']="/home/met/mqclient.ini" would be the Node way.

But we are just using the C client underneath, so all the usual ways of pointing at the ini file will work, and it's the standard set of options. There's nothing special needed in the ini file for a Node program.

creasman commented 9 months ago

Thanks for your help, Mark. We can close this if you like.

If/when you decide to tackle events I am interested in being a resource for dev and/or testing should you need it. This would be a nice feature to have.

ibm-messaging / mq-mqi-nodejs

Not seeing full error message when reconnect fails #174