Open weiliudong opened 1 year ago
server linux version : Linux iZwz9d6oh1bh7ilukhile6Z 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64 GNU/Linux
@weiliudong What's the previous version you was using? Do you also know what's the subscrbie action that causes this?
@robertsLando The previous version was "aedes": "^0.46.3" "aedes-persistence-redis": "^8.0.1" "mqemitter-redis": "^4.2.0"
Could you tell me what subscribe command throws tha?
@robertsLando It should be an exception when this method is called. function authorize (sub, done) { const err = validateTopic(sub.topic, 'SUBSCRIBE') if (err) { return done(err) } const client = this.client client.broker.authorizeSubscribe(client, sub, done) }
Compare the old version
This is a lot like theme permissions authorizeSubscribe The problem but i'm not sure
The problem is that for some reason the topic is null/undefined when it comes there, question is why and how it could be undefined. This is why I'm asking what kind of subscribe causes that problem
Received, here is to determine where the caller is, find and send it to you for analysis
@robertsLando I found the problem here, it is the problem of your module configuration "aedes": "^0.48.0", "aedes-persistence-redis": "^9.0.1", When used together, the above-mentioned problems occur directly;
when I replace it with "aedes": "^0.48.0", "aedes-persistence-redis": "^8.0.1", fully working fine
Conclusion: The new version of "aedes": "^0.48.0" does not fit the new version of "aedes-persistence-redis": "^9.0.1"
Currently "aedes": "^0.48.0" and "aedes-persistence-redis": "^8.0.1" will work.
This is the problem we are currently encountering with the upgrade module
Because it is a problem with aedes internal module adaptation, we can't confirm the specific problem for you, you have to investigate.
Could you give a try to
"aedes": "^0.48.0",
"aedes-persistence-redis": "^9.0.0",
@weiliudong does that fixed your problem?
Could you give a try to
"aedes": "^0.48.0", "aedes-persistence-redis": "^9.0.0",
It's the same error. I've tried with:
/home/aedes-broker-tests/node_modules/standard-as-callback/built/index.js:6 throw e; ^
TypeError: Cannot read property 'length' of undefined at validateTopic (/home/aedes-broker-tests/node_modules/aedes/lib/utils.js:6:21) at SubState.authorize (/home/aedes-broker-tests/node_modules/aedes/lib/handlers/subscribe.js:93:15) at Holder.work (/home/aedes-broker-tests/node_modules/fastfall/fall.js:105:23) at SubState.compiled (/home/aedes-broker-tests/node_modules/fastfall/fall.js:55:17) at SubscribeState.doSubscribe (/home/aedes-broker-tests/node_modules/aedes/lib/handlers/subscribe.js:89:16) at goResultsFunc (/home/aedes-broker-tests/node_modules/fastparallel/parallel.js:63:16) at Aedes.parallel [as _parallel] (/home/aedes-broker-tests/node_modules/fastparallel/parallel.js:33:9) at handleSubscribe (/home/aedes-broker-tests/node_modules/aedes/lib/handlers/subscribe.js:78:17) at ClientPacketStatus.restoreSubs (/home/aedes-broker-tests/node_modules/aedes/lib/handlers/connect.js:180:5) at makeCallTwo (/home/aedes-broker-tests/node_modules/fastseries/series.js:150:3)
@rosenvivanov could you create a script to reproduce this error please?
@rosenvivanov could you create a script to reproduce this error please?
Actually, something mess related to the node folder. With deletion and clear npm install, now it's working. I'll test more and will let you know.
ok thanks!
ok thanks!
I've tested and I haven't managed to get the above error. We can consider that it has been something broken in the node modules.
But I noticed something other:
Event connectionError: null Error: read ECONNRESET at TCP.onStreamRead (internal/stream_base_commons.js:209:20) { errno: -104, code: 'ECONNRESET', syscall: 'read' }
This is printed by below event every 20-30 sec
aedes.on('connectionError', function (client, error) { console.log('Event connectionError: \x1b[31m' + (client ? client.id : 'undefined') + '\x1b[0m'); console.log(error); })
That error is emitted here
ECONNRESET error is emitted when:
the other side of the TCP conversation abruptly closed its end of the connection. This is most probably due to one or more application protocol errors.
Essentially one of your clients connecting to broker closes the connection before ending the connect, for some unknown reason
@robertsLando It is normal for the device side to have jitter due to poor network connection, but as aedes, we should not interrupt the main process of the entire program for this reason, but throw the exception to the application layer for processing.
The production environment has already deployed a lot of aedes, and they dare not upgrade to the new version.
For this length exception, most of them should be attributable to protocol errors, resulting in aede not being able to get the topic, unable to parse the corresponding topic length, and causing the program to crash.
This is the part I made by downloading the source code of aedes and then modifying it so that the new version of aedes "aedes": "^0.48.0", "aedes-persistence-redis": "^9.1.0", to work together
function handleSubscribe(client, packet, restore, done) {
packet.subscriptions = replace(client, packet)// Subscribe while processing connection
packet.subscriptions = packet.subscriptions.length === 1 ? packet.subscriptions : _dedupe(packet.subscriptions)
client.broker._parallel(
new SubscribeState(client, packet, restore, done), // what will be this in the functions
doSubscribe, // function to call
packet.subscriptions, // first argument of the function
restore ? done : completeSubscribe // the function to be called when the parallel ends
)
function replace(client, packet) {
const {id} = client
const array = []
const {subscriptions} = packet
for (const item of subscriptions) {
if (item === 49) {
array.push({topic: `${id}/connect`, qos: 0})
} else {
array.push(item)
}
}
return array
}
}
It is normal for the device side to have jitter due to poor network connection, but as aedes, we should not interrupt the main process of the entire program for this reason, but throw the exception to the application layer for processing.
Sure, what I don't understand is why this happens, I mean I know where the exception is thrown: https://github.com/moscajs/aedes/blob/main/lib/utils.js#L6
Because fixing that code takes 10 seconds but I have a feel the error could just be hidden in that way, topic should never be null
@weiliudong Please explain this:
for (const item of subscriptions) {
if (item === 49) {
array.push({topic: `${id}/connect`, qos: 0})
} else {
array.push(item)
}
}
Why an item could be 49
??
@robertsLando yes
We use the C code written by this connected module to connect to aedes, you can use this module to reproduce https://mongoose.ws/tutorials/ you can look here
@robertsLando
@weiliudong Please explain this:
for (const item of subscriptions) {
if (item === 49) {
array.push({topic: ${id}/connect
, qos: 0})
} else {
array.push(item)
}
}
Why an item could be 49 ??
It seems that there is a problem with the action "connectActions" when connecting
What is the client.broker._series method used for?
Ok so that item is something fetched from redis subscriptions? I don't remember if the error goes away by downgrading aedes persistence redis or not. (be sure to clean up redis db when doingg a new test or some errors could be due to the previous versions)
What is the client.broker._series method used for?
It uses fast-series module:
Zero-overhead series function call for node.js
So the error could be in redis addSubscriptions
method: https://github.com/moscajs/aedes-persistence-redis/blob/master/persistence.js#L113
Or subscriptionsByClient
(used to fetch subscriptions to restore once a client connects with clean flag set to false): https://github.com/moscajs/aedes-persistence-redis/blob/master/persistence.js#L194
Could you try to debug this to see if something is wrong there?
remember
ok
Or
subscriptionsByClient
(used to fetch subscriptions to restore once a client connects with clean flag set to false): https://github.com/moscajs/aedes-persistence-redis/blob/master/persistence.js#L194
We have the same issue upgrading from
"aedes": "0.46.3",
"aedes-persistence-redis": "8.0.1",
to
"aedes": "0.48.0",
"aedes-persistence-redis": "9.0.1",
In our case it happens only with clients with clean flag false. I'm unable to reproduce this on test env with clean redis. But it happens if redis database kept between updates.
Following simple test fails if very same redis is used, and cache not cleaned, aedes updated. Client is mqtt.js promisfied.
// start with aedes 0.46.3 and aedes-persistence-redis 9.0.1
let service = await connect(host, { clientId: 'c', username: 'u', password: 'p', clean: false, resubscribe: false });
await service.subscribe('device/id/+/#');
await service.end(true);
// update aedes to 0.48.0 and aedes-persistence-redis to 9.0.1 , but keep the Redis database
let service = await connect(host, { clientId: 'c', username: 'u', password: 'p', clean: false, resubscribe: false });
await service.subscribe('device/id/+/#');
await service.end(true);
// throws TypeError: Cannot read properties of undefined (reading 'length')
// downgrade aedes/aedes-persistence-redis or clean redis.
let service = await connect(host, { clientId: 'c', username: 'u', password: 'p', clean: false, resubscribe: false });
await service.subscribe('device/id/+/#');
await service.end(true);
// works fine
@reijosirila Pleaase check my previous comment: https://github.com/moscajs/aedes/issues/780#issuecomment-1299742571
I think the error is here: https://github.com/moscajs/aedes-persistence-redis/blob/master/persistence.js#L194
It could have been introduced from: https://github.com/moscajs/aedes-persistence-redis/pull/96/files
In that version the clientId has been sanitized using encodeURIComponent
function like you can see here
That's why we also bumped a major for that. Could you do some tests and check if that's the reason?
Added some debug before and after update to function subscriptionsByClient
#### hgetall? clientSubKey? client:service-test
#### hgetall? subs? { 'device/id/+/#': '0' }
#### hgetall? toReturn? [ { topic: 'device/id/+/#', qos: 0 } ]
// after update
### hgetallBuffer clientSubKey(client.id)? client:service-test
### hgetallBuffer subs? { 'device/id/+/#': <Buffer 30> }
### hgetallBuffer toReturn? [ 48 ]
function returnSubsForClient
return type is different, if data written by previous module.
{
topic: string,
qos: int
}[]
vs
number[]
After returning from returnSubsForClient
previous type, everything works fine.
if data is written by new module, returnSubsForClient
works:
###hgetallBuffer clientSubKey(client.id)? client:service-test
### hgetallBuffer subs? {
'device/id/+/#': <Buffer 82 a5 74 6f 70 69 63 ad 64 65 76 69 63 65 2f 69 64 2f 2b 2f 23 a3 71 6f 73 00>
}
### hgetallBuffer toReturn? [ { topic: 'device/id/+/#', qos: 0 } ]
@reijosirila I have just published a patch that at least should prevent broker to crash.
About the issue, the problem is that previously the subs where not encoded with msgpack, with version 9 the encode has been added so when decode is called it is called on a clean object (that was not encoded previously) and that causes the issue. Could you submit a fix to aedes-persistence-redis? You should just try to skip the decode when the object is already decoded here
Yep. Just finished checking addSubscriptions
function and this is the case. I'll try to do PR.
After replacing with the latest version, the device is abnormally connected to the cluster. "aedes": "^0.48.0" "aedes-persistence-redis": "^9.0.1" "mqemitter-redis": "^5.0.0"
4|mglink_mqtt | TypeError: Cannot read properties of undefined (reading 'length') 4|mglink_mqtt | at validateTopic (/root/mglink_node/node_modules/aedes/lib/utils.js:6:21) 4|mglink_mqtt | at SubState.authorize (/root/mglink_node/node_modules/aedes/lib/handlers/subscribe.js:93:15) 4|mglink_mqtt | at Holder.work (/root/mglink_node/node_modules/fastfall/fall.js:105:23) 4|mglink_mqtt | at SubState.compiled (/root/mglink_node/node_modules/fastfall/fall.js:55:17) 4|mglink_mqtt | at SubscribeState.doSubscribe (/root/mglink_node/node_modules/aedes/lib/handlers/subscribe.js:89:16) 4|mglink_mqtt | at goResultsFunc (/root/mglink_node/node_modules/fastparallel/parallel.js:63:16) 4|mglink_mqtt | at Aedes.parallel [as _parallel] (/root/mglink_node/node_modules/fastparallel/parallel.js:33:9) 4|mglink_mqtt | at handleSubscribe (/root/mglink_node/node_modules/aedes/lib/handlers/subscribe.js:78:17) 4|mglink_mqtt | at ClientPacketStatus.restoreSubs (/root/mglink_node/node_modules/aedes/lib/handlers/connect.js:180:5) 4|mglink_mqtt | at makeCallTwo (/root/mglink_node/node_modules/fastseries/series.js:150:3)
help me