Closed lorro closed 4 years ago
If _qz.websocket.connection.version
is null or undefined, it's a sign that the websocket has closed, nulling it out.
This usually occurs if something closed out the connection (observable by the QZ Logs). Often the connection is not obvious, but can be caused by page navigation (back/POST/DOM reload/variable redefine).
Hi @tresf, I understand this indicates the websocket has been closed, but it's kinda odd that all the cashregisters that are running 2.1.0-RC5 have never had this and still don't have. The few cash registers that are using 2.1.0-RC7 have several notifications of this on a daily basis.
Until now I haven't been able to reproduce the issue on my own physical device either :/ I ran some tests with 2.1.0-RC7 too but everything worked fine. So I had some suspicions the connection got timed out f.e.
I'll try if I can see if there's any useful info in these logs. The connection is actually initiated by me when the root node of the application (App.vue) gets mounted to the DOM.
@lorro have you collected the logs from the workstations? There may be something indicative of a crash.
Pinpointing the code changes between RCs in an attempt to identify if a code change has caused this is a bit difficult due to the "unreleased" state of the 2.1 branch, but we can infer...
This introduces 11 months of commits -- including anything synced over from the 2.0 branch -- changes are pretty massive, so catching a crash in the logs that's taking down the websocket might be our best path...
https://github.com/qzind/tray/compare/ef3d2aa4e3a22f456177c08cc6ce9756f99aabda...2.1
@tresf I managed to get an export of the debug.log of two different cash registers, one with RC5 and the other RC7. In both files I found a few lines that indicate a closing of the connection. However in RC7 it's often because of a "WebSocket Read EOF". In RC5 I only see it because of "Disconnected". I reckon this is by shutting down the browser or computer. But never because of "Websocket Read EOF".
RC7:
2019-11-25 13:58:13,887 [DEBUG] Searching for PrintService matching CITIZEN CT-S310II
2019-11-25 13:58:13,888 [DEBUG] Found 6 printers
2019-11-25 13:58:13,888 [DEBUG] Found match: CITIZEN CT-S310II
2019-11-25 13:58:13,888 [DEBUG] Found Resolution: 20300x20300 dphi
2019-11-25 13:58:13,909 [DEBUG] Parsed 1 html records
2019-11-25 13:58:14,065 [DEBUG] Paper area: 7,0:212,9286
2019-11-25 13:58:14,065 [DEBUG] Document will be printed across 1 pages
2019-11-25 13:58:14,920 [INFO] Printing complete
2019-11-25 14:18:42,461 [INFO] Connection closed: 1006 - WebSocket Read EOF
2019-11-25 14:18:42,462 [INFO] Closing all communication channels for <xxx>
2019-11-25 15:17:51,766 [INFO] Connection opened from /0:0:0:0:0:0:0:1:61856 on socket port 8181
RC5:
2019-11-20 14:01:02,551 [DEBUG] Searching for PrintService matching CITIZEN CT-S310II
2019-11-20 14:01:02,552 [DEBUG] Found 4 printers
2019-11-20 14:01:02,552 [DEBUG] Found match: CITIZEN CT-S310II
2019-11-20 14:01:02,552 [DEBUG] Found Resolution: 20300x20300 dphi
2019-11-20 14:01:02,568 [DEBUG] Parsed 1 html records
2019-11-20 14:01:03,086 [DEBUG] Paper area: 7,0:212,9286
2019-11-20 14:01:03,097 [DEBUG] Document will be printed across 1 pages
2019-11-20 14:01:03,586 [INFO] Printing complete
2019-11-20 16:41:27,950 [INFO] Connection closed: 1006 - Disconnected
2019-11-20 16:41:29,463 [INFO] Closing all communication channels for <xxx>
2019-11-20 16:41:48,096 [INFO] Connection opened from /0:0:0:0:0:0:0:1:64850 on socket port 8181
Does this mean anything to you? :) Thanks
Does this mean anything to you? :)
Yes, sort of... We started catching this error in our RC9 version due to -- what we believed to be -- some issues with the newer Jetty libraries.
What's strange is on RC9 it just caused a popup tray notification, it wasn't causing the qz websocket object to die. The logs don't inditicate it would cause it to die either (nor do yours).
Would you be willing to share the full logs with us? support@qz.io?
Then we can see if there's something unobvious happening.
Would you be willing to share the full logs with us? support@qz.io?
I just sent you an email with the full log.
Thanks. I notice that the logs show both {type: "pixel" ... }
and {type: "raw" ... }
. Does the problem appear to occur more with one over the other?
I can't see anything obviously wrong with them either. I'm not sure why the websocket is dying on you.
Connection closed: 1006 - WebSocket Read EOF
What confuses me is that both logs indicate Closing all communication channels
, so if this is timing with a page navigation, it should be consistent between versions. Furthemore, we've never had this specific problem reported to us where it was NOT a JavaScript coding issue.
Can you make an example that causes this issue in a stand-alone unit test? If so we can run the software in debug and see what's going on.
@tresf: I managed to replicate this error on my own physical device now. I noticed this should have something to do with the version of Java that is installed on the machine. (And not necessarily with the version of QZ Tray that is installed).
I just updated my Java to the latest version (version 8 update 231) and the first time I tried printing something after starting the QZ Tray application I got the "cannot read property version of null" notification.
After refreshing the browser, printing does work.
@lorro thanks. Once you have some steps/code for us, we'll happily take a look.
@tresf: Sorry, I think I took my conclusions a bit too early. Seems I just didn't reload our application after restarting the QZ Tray application (after the update to the newest Java version). Hence the QZ Tray connection wasn't re-established so I got that error.
It could be related to the Java version but actually we're not sure yet. We also tried downgrading to RC5 on one of the affected cash registers that were installed with RC7 and after that the error persists so it's probably not directly related to the build version. So the only difference between cash registers that have issues and those that have not is probably the Java version. But it's hard to replicate as it's an issue that only pops up once in a while.
P.S. For now I attached a callback to the setClosedCallbacks function to re-establish the connection whenever it's lost.
qz.websocket.setClosedCallbacks(() => {
setTimeout(() => {
this.connectToPrinter()
}, 100)
})
P.S. For now I attached a callback to the
setClosedCallbacks
function to re-establish the connection whenever it's lost.
Great, thanks.
One more thing I was thinking about last night about this problem is that certain WebSocket implementations failed sporadically for us in our unit tests. Is there a chance you're using our API overrides to use a 3rd-party (non-browser) WebSocket implementation? If so, that could be the culprit. I was writing some unit tests in Node.js years ago and remember encountering this. It's a shot in the dark, but I wanted to mention it.
Deleted my previous comment. Didn't understand the question at first. This is our API implementation for the QZ Tray connection.
initPrinterConnection () {
qz.api.setPromiseType(function promise (resolver) {
return new RSVP.Promise(resolver)
})
qz.api.setSha256Type(function (data) {
return createHash('sha256').update(data).digest('hex')
})
qz.security.setCertificatePromise(function (resolve) {
return axios.get(`${domain}/cert.pem`).then(data => resolve(data.data))
})
qz.security.setSignaturePromise(function (toSign) {
return function (resolve) {
return axios.get(`${domain}/sign`, { params: { request: toSign } }).then(data => resolve(data.data))
}
})
return qz.websocket.connect({ retries: 5 })
.then(() => Promise.resolve())
.catch((data) => Promise.reject(data))
}
Unfortunately, my mentioned "solution" from above apparently didn't fix this issue completely. From the two different error notifications I saw coming in before there's still one that pops up now and then...
That's this one:
TypeError: Cannot read property 'split' of undefined (Most recent call first)
File webpack:///./own_modules/qz-tray/qz-tray.js line 627 col 1 in compatible
var semver = _qz.websocket.connection.version.split(/[.-]/g);
File webpack:///./own_modules/qz-tray/qz-tray.js line 1169 col 1 in print
_qz.tools.compatible(data[d]);
The other one however seems to be fixed, as we don't see that anymore in the logs.
TypeError: Cannot read property 'version' of null (Most recent call first)
File webpack:///./own_modules/qz-tray/qz-tray.js line 627 col 1 in compatible
var semver = _qz.websocket.connection.version.split(/[.-]/g);
File webpack:///./own_modules/qz-tray/qz-tray.js line 1169 col 1 in print
_qz.tools.compatible(data[d]);
So apparently the _qz.websocket.connection always has a correct value, but the version property sometimes becomes undefined.
qz.api.setPromiseType(function promise (resolver) {
return new RSVP.Promise(resolver)
})
qz.api.setSha256Type(function (data) {
return createHash('sha256').update(data).digest('hex')
})
Thanks. Some frameworks require qz.api.setWebSocketType(...)
, which is where I saw a similar problem in the past. If you're not using this, it rules that out.
So apparently the
_qz.websocket.connection
always has a correct value, but theversion
property sometimes becomes undefined.
You also have a 100ms
+ natural reconnect time for it to recover, so any calls between those events will fail. There's no away to avoid this except to identify what's dropping the connection in the first place.
One thought... I see you use axios.get(...)
for AJAX. Can you add an error handler to that call to see if the error coincides with a 404 style problem?
Furthemore, at this point, it would help if you could isolate the problem to a small reproducible example so that we could reproduce on our end.
Furthemore, at this point, it would help if you could isolate the problem to a small reproducible example so that we could reproduce on our end.
I surely understand, problem is I can't really reproduce the issue myself. It's only certain employees that work at a remote store that have the issue sometimes (but not every one). I haven't been able to reproduce it on my own machine, so don't have quite an idea what the culprit is. So it's kinda hard.
For now I commented out some lines in qz-tray.js to resolve this case for our very own usecase.
//clean up data formatting
for (var d = 0; d < data.length; d++) {
_qz.tools.relative(data[d]);
// _qz.tools.compatible(data[d]);
}
Noticed the only thing that commented line does is converting a message to version 2.0 format. And splitting the textual numeric version number into an array in doing so. That was where the error popped up and blocked execution of the rest of the code. We're certain we're always using version 2.1. At least they can open the drawer now :)
Just to rehash this problem...
POS are using our own compiled version of QZ Tray
2.1.0-RC5
, without issues.However, the last couple of cash registers that we registered in the POS software are using a newer compiled version of QZ Tray
2.1.0-RC7
. Now regulary some issues pop up when an employee tries to print/open the drawer. The following javascript error is returned regularly.
My instinct is that this is identical to this issue, but we never backported the fix to RC7. https://github.com/qzind/tray/pull/504/commits/fa36a272fce52d2f46b67e461e8164205b567754
RC9 is still considered experimental, but it's passing most tests identically to RC7 if you're interested: https://github.com/tresf/tray/releases/tag/v2.10-RC9
Furthemore, we can back-port the patch to RC7 if needed, but it would be a one-off build, all future updates would land on RC9 or 2.1.0 proper.
Hi @tresf , I'm developing against the latest v2.1 QZTray (running on macOS catalina), and am routinely seeing similar issues with the javascript client encountering fatal Cannot read property 'split' of undefined
errors even when the websocket connection is successfully established. Usually, just exiting and restarting the QZtray instance will fix the problem temporarily.
TypeError: Cannot read property 'split' of undefined
at Object.isVersion (qz-tray.js:638)
at Object.algorithm (qz-tray.js:716)
at Object.setSignatureAlgorithm (qz-tray.js:2080)
There seems to be a correlation with the order of websocket events that are received. If the first websocket message received by the client does not contain the version number, the error will happen, even though the subsequent message does contain it. In this screenshot, I get the fatal JS error:
(notice the first message is just an object with only a "uid"
field)
When working properly, with no issues, the first message received appears to always be the version number, like this:
Could this possibly be caused by my implementation on the JS/client side, i.e. some sort of race condition or something?
It could be a race condition, yes. Can you share some of your startup code so we can look for race conditions?
For example, normally, you'd call
qz.websocket.connect()
qz.printers.find(...)
qz.print(...)
If not using a proper promise chain, you can run into race considtions.
The promise chain is historically done using Promise.then(Promise.then(...))
however the await
keyword will work as well if placed inside an async
function.
It could be a race condition, yes. Can you share some of your startup code so we can look for race conditions?
For example, normally, you'd call
qz.websocket.connect()
-- then --qz.printers.find(...)
-- then --qz.print(...)
If not using a proper promise chain, you can run into race considtions.
The promise chain is historically done using
Promise.then(Promise.then(...))
however theawait
keyword will work as well if placed inside anasync
function.
@tresf thanks for the reply. Yeah, I think I may have figured it out. I was doing those actions in the correct order, and properly waiting for the promises to resolve before executing the next one, however I was also re-executing all the startup/config methods (e.g. qz.security.setSignatureAlgorithm
, setCertificatePromise
, setSignaturePromise
, etc) before every new print action/command i sent to qz, as part of my logic to make sure it was connected before running the command.
I modified it to make sure those config methods are only set once on startup, and the issue hasn't come back since.
re-executing all the startup/config (e.g. qz.security.setSignatureAlgorithm, setCertificatePromise, setSignaturePromise, etc) before every new print action/command
Examining the code, I don't see any which alters the connection
object in either function however, we do use compatible
in setSignaturePromise to warn about hash conflicts. That might be what's happening. Pinging @bberenz to see if we have an unchecked call.
I can see where this would occur. We check for active connection, but don't guard against a connection which hasn't replied yet, so there's a chance connection
is active, but version
is null. This will happen if the signature API is configured mid-connection. This isn't really supported (API calls should happen first), but I think we can guard against it.
PR opened in #656 which offers a sanitization around the problematic algorithm
call as well as a better error message.
As it turns out there may still be a bug with the version logic. Tracking here: #538
Hi,
A while ago we developed a POS software which has been working fine so far. Printing of receipts and opening of the cash drawer runs through the QZ tray software. The first cash registers that have been registered in the POS are using our own compiled version of QZ Tray 2.1.0-RC5, without issues.
However, the last couple of cash registers that we registered in the POS software are using a newer compiled version of QZ Tray 2.1.0-RC7. Now regulary some issues pop up when an employee tries to print/open the drawer. The following javascript error is returned regularly.
After doing a manual browser refresh printing works again. That's however not really a solution as the POS is a single page application.
Anyone has an idea what direction we should look into to fix this issues? Has any remarkable changes been made that could cause this behavior? The connection with QZ Tray is setup on first page load. Is the connection with QZ tray lost for some reason perhaps?
Thanks