moscajs / aedes

Barebone MQTT broker that can run on any stream server, the node way
MIT License
1.78k stars 231 forks source link

[bug]Use the latest version #780

Open weiliudong opened 1 year ago

weiliudong commented 1 year ago

After replacing with the latest version, the device is abnormally connected to the cluster. "aedes": "^0.48.0" "aedes-persistence-redis": "^9.0.1" "mqemitter-redis": "^5.0.0"

4|mglink_mqtt | TypeError: Cannot read properties of undefined (reading 'length') 4|mglink_mqtt | at validateTopic (/root/mglink_node/node_modules/aedes/lib/utils.js:6:21) 4|mglink_mqtt | at SubState.authorize (/root/mglink_node/node_modules/aedes/lib/handlers/subscribe.js:93:15) 4|mglink_mqtt | at Holder.work (/root/mglink_node/node_modules/fastfall/fall.js:105:23) 4|mglink_mqtt | at SubState.compiled (/root/mglink_node/node_modules/fastfall/fall.js:55:17) 4|mglink_mqtt | at SubscribeState.doSubscribe (/root/mglink_node/node_modules/aedes/lib/handlers/subscribe.js:89:16) 4|mglink_mqtt | at goResultsFunc (/root/mglink_node/node_modules/fastparallel/parallel.js:63:16) 4|mglink_mqtt | at Aedes.parallel [as _parallel] (/root/mglink_node/node_modules/fastparallel/parallel.js:33:9) 4|mglink_mqtt | at handleSubscribe (/root/mglink_node/node_modules/aedes/lib/handlers/subscribe.js:78:17) 4|mglink_mqtt | at ClientPacketStatus.restoreSubs (/root/mglink_node/node_modules/aedes/lib/handlers/connect.js:180:5) 4|mglink_mqtt | at makeCallTwo (/root/mglink_node/node_modules/fastseries/series.js:150:3)

help me

weiliudong commented 1 year ago

server linux version : Linux iZwz9d6oh1bh7ilukhile6Z 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64 GNU/Linux

robertsLando commented 1 year ago

@weiliudong What's the previous version you was using? Do you also know what's the subscrbie action that causes this?

weiliudong commented 1 year ago

@robertsLando The previous version was "aedes": "^0.46.3" "aedes-persistence-redis": "^8.0.1" "mqemitter-redis": "^4.2.0"

robertsLando commented 1 year ago

Could you tell me what subscribe command throws tha?

weiliudong commented 1 year ago

@robertsLando It should be an exception when this method is called. function authorize (sub, done) { const err = validateTopic(sub.topic, 'SUBSCRIBE') if (err) { return done(err) } const client = this.client client.broker.authorizeSubscribe(client, sub, done) }

Compare the old version

This is a lot like theme permissions authorizeSubscribe The problem but i'm not sure

robertsLando commented 1 year ago

The problem is that for some reason the topic is null/undefined when it comes there, question is why and how it could be undefined. This is why I'm asking what kind of subscribe causes that problem

weiliudong commented 1 year ago

Received, here is to determine where the caller is, find and send it to you for analysis

weiliudong commented 1 year ago

@robertsLando I found the problem here, it is the problem of your module configuration "aedes": "^0.48.0", "aedes-persistence-redis": "^9.0.1", When used together, the above-mentioned problems occur directly;

when I replace it with "aedes": "^0.48.0", "aedes-persistence-redis": "^8.0.1", fully working fine

Conclusion: The new version of "aedes": "^0.48.0" does not fit the new version of "aedes-persistence-redis": "^9.0.1"

Currently "aedes": "^0.48.0" and "aedes-persistence-redis": "^8.0.1" will work.

This is the problem we are currently encountering with the upgrade module

Because it is a problem with aedes internal module adaptation, we can't confirm the specific problem for you, you have to investigate.

robertsLando commented 1 year ago

Could you give a try to

"aedes": "^0.48.0",
"aedes-persistence-redis": "^9.0.0",
robertsLando commented 1 year ago

@weiliudong does that fixed your problem?

rosenvivanov commented 1 year ago

Could you give a try to

"aedes": "^0.48.0",
"aedes-persistence-redis": "^9.0.0",

It's the same error. I've tried with:

/home/aedes-broker-tests/node_modules/standard-as-callback/built/index.js:6 throw e; ^

TypeError: Cannot read property 'length' of undefined at validateTopic (/home/aedes-broker-tests/node_modules/aedes/lib/utils.js:6:21) at SubState.authorize (/home/aedes-broker-tests/node_modules/aedes/lib/handlers/subscribe.js:93:15) at Holder.work (/home/aedes-broker-tests/node_modules/fastfall/fall.js:105:23) at SubState.compiled (/home/aedes-broker-tests/node_modules/fastfall/fall.js:55:17) at SubscribeState.doSubscribe (/home/aedes-broker-tests/node_modules/aedes/lib/handlers/subscribe.js:89:16) at goResultsFunc (/home/aedes-broker-tests/node_modules/fastparallel/parallel.js:63:16) at Aedes.parallel [as _parallel] (/home/aedes-broker-tests/node_modules/fastparallel/parallel.js:33:9) at handleSubscribe (/home/aedes-broker-tests/node_modules/aedes/lib/handlers/subscribe.js:78:17) at ClientPacketStatus.restoreSubs (/home/aedes-broker-tests/node_modules/aedes/lib/handlers/connect.js:180:5) at makeCallTwo (/home/aedes-broker-tests/node_modules/fastseries/series.js:150:3)

robertsLando commented 1 year ago

@rosenvivanov could you create a script to reproduce this error please?

rosenvivanov commented 1 year ago

@rosenvivanov could you create a script to reproduce this error please?

Actually, something mess related to the node folder. With deletion and clear npm install, now it's working. I'll test more and will let you know.

robertsLando commented 1 year ago

ok thanks!

rosenvivanov commented 1 year ago

ok thanks!

I've tested and I haven't managed to get the above error. We can consider that it has been something broken in the node modules.

But I noticed something other:

Event connectionError: null Error: read ECONNRESET at TCP.onStreamRead (internal/stream_base_commons.js:209:20) { errno: -104, code: 'ECONNRESET', syscall: 'read' }

This is printed by below event every 20-30 sec

aedes.on('connectionError', function (client, error) { console.log('Event connectionError: \x1b[31m' + (client ? client.id : 'undefined') + '\x1b[0m'); console.log(error); })

robertsLando commented 1 year ago

That error is emitted here

ECONNRESET error is emitted when:

the other side of the TCP conversation abruptly closed its end of the connection. This is most probably due to one or more application protocol errors.

Essentially one of your clients connecting to broker closes the connection before ending the connect, for some unknown reason

weiliudong commented 1 year ago

@robertsLando It is normal for the device side to have jitter due to poor network connection, but as aedes, we should not interrupt the main process of the entire program for this reason, but throw the exception to the application layer for processing.

The production environment has already deployed a lot of aedes, and they dare not upgrade to the new version.

For this length exception, most of them should be attributable to protocol errors, resulting in aede not being able to get the topic, unable to parse the corresponding topic length, and causing the program to crash.

weiliudong commented 1 year ago

This is the part I made by downloading the source code of aedes and then modifying it so that the new version of aedes "aedes": "^0.48.0", "aedes-persistence-redis": "^9.1.0", to work together

function handleSubscribe(client, packet, restore, done) {
    packet.subscriptions = replace(client, packet)// Subscribe while processing connection
    packet.subscriptions = packet.subscriptions.length === 1 ? packet.subscriptions : _dedupe(packet.subscriptions)
    client.broker._parallel(
        new SubscribeState(client, packet, restore, done), // what will be this in the functions
        doSubscribe, // function to call
        packet.subscriptions, // first argument of the function
        restore ? done : completeSubscribe // the function to be called when the parallel ends
    )

    function replace(client, packet) {
        const {id} = client
        const array = []
        const {subscriptions} = packet
        for (const item of subscriptions) {
            if (item === 49) {
                array.push({topic: `${id}/connect`, qos: 0})
            } else {
                array.push(item)
            }
        }
        return array
    }
}
robertsLando commented 1 year ago

It is normal for the device side to have jitter due to poor network connection, but as aedes, we should not interrupt the main process of the entire program for this reason, but throw the exception to the application layer for processing.

Sure, what I don't understand is why this happens, I mean I know where the exception is thrown: https://github.com/moscajs/aedes/blob/main/lib/utils.js#L6

Because fixing that code takes 10 seconds but I have a feel the error could just be hidden in that way, topic should never be null

robertsLando commented 1 year ago

@weiliudong Please explain this:

for (const item of subscriptions) {
            if (item === 49) {
                array.push({topic: `${id}/connect`, qos: 0})
            } else {
                array.push(item)
            }
        }

Why an item could be 49 ??

weiliudong commented 1 year ago

@robertsLando yes

weiliudong commented 1 year ago

We use the C code written by this connected module to connect to aedes, you can use this module to reproduce https://mongoose.ws/tutorials/ image you can look here

weiliudong commented 1 year ago

@robertsLando

@weiliudong Please explain this:

for (const item of subscriptions) { if (item === 49) { array.push({topic: ${id}/connect, qos: 0}) } else { array.push(item) } } Why an item could be 49 ?? image

image

It seems that there is a problem with the action "connectActions" when connecting

What is the client.broker._series method used for?

robertsLando commented 1 year ago

Ok so that item is something fetched from redis subscriptions? I don't remember if the error goes away by downgrading aedes persistence redis or not. (be sure to clean up redis db when doingg a new test or some errors could be due to the previous versions)

robertsLando commented 1 year ago

What is the client.broker._series method used for?

It uses fast-series module:

Zero-overhead series function call for node.js

robertsLando commented 1 year ago

So the error could be in redis addSubscriptions method: https://github.com/moscajs/aedes-persistence-redis/blob/master/persistence.js#L113

Or subscriptionsByClient (used to fetch subscriptions to restore once a client connects with clean flag set to false): https://github.com/moscajs/aedes-persistence-redis/blob/master/persistence.js#L194

Could you try to debug this to see if something is wrong there?

weiliudong commented 1 year ago

remember

ok

reijosirila commented 1 year ago

Or subscriptionsByClient (used to fetch subscriptions to restore once a client connects with clean flag set to false): https://github.com/moscajs/aedes-persistence-redis/blob/master/persistence.js#L194

We have the same issue upgrading from

"aedes": "0.46.3",
"aedes-persistence-redis": "8.0.1",

to

"aedes": "0.48.0",
"aedes-persistence-redis": "9.0.1", 

In our case it happens only with clients with clean flag false. I'm unable to reproduce this on test env with clean redis. But it happens if redis database kept between updates.

Following simple test fails if very same redis is used, and cache not cleaned, aedes updated. Client is mqtt.js promisfied.

    // start with aedes 0.46.3 and aedes-persistence-redis 9.0.1
    let service = await connect(host, { clientId: 'c', username: 'u', password: 'p', clean: false, resubscribe: false });
    await service.subscribe('device/id/+/#'); 
    await service.end(true);
    // update aedes to 0.48.0 and aedes-persistence-redis to  9.0.1 , but keep the Redis database
    let service = await connect(host, { clientId: 'c', username: 'u', password: 'p', clean: false, resubscribe: false });
    await service.subscribe('device/id/+/#');  
    await service.end(true);
    // throws TypeError: Cannot read properties of undefined (reading 'length')
    // downgrade aedes/aedes-persistence-redis or clean redis.
    let service = await connect(host, { clientId: 'c', username: 'u', password: 'p', clean: false, resubscribe: false });
    await service.subscribe('device/id/+/#'); 
    await service.end(true);
    // works fine
robertsLando commented 1 year ago

@reijosirila Pleaase check my previous comment: https://github.com/moscajs/aedes/issues/780#issuecomment-1299742571

I think the error is here: https://github.com/moscajs/aedes-persistence-redis/blob/master/persistence.js#L194

It could have been introduced from: https://github.com/moscajs/aedes-persistence-redis/pull/96/files

In that version the clientId has been sanitized using encodeURIComponent function like you can see here

That's why we also bumped a major for that. Could you do some tests and check if that's the reason?

reijosirila commented 1 year ago

Added some debug before and after update to function subscriptionsByClient

    #### hgetall? clientSubKey? client:service-test
    #### hgetall? subs? { 'device/id/+/#': '0' }
    #### hgetall? toReturn? [ { topic: 'device/id/+/#', qos: 0 } ]
// after update
    ### hgetallBuffer clientSubKey(client.id)? client:service-test
    ### hgetallBuffer subs? { 'device/id/+/#': <Buffer 30> }
    ### hgetallBuffer toReturn? [ 48 ]

function returnSubsForClient return type is different, if data written by previous module.

{
      topic: string,
      qos: int
}[]

vs

number[]

After returning from returnSubsForClient previous type, everything works fine. if data is written by new module, returnSubsForClient works:

    ###hgetallBuffer clientSubKey(client.id)? client:service-test
    ### hgetallBuffer subs? {
      'device/id/+/#': <Buffer 82 a5 74 6f 70 69 63 ad 64 65 76 69 63 65 2f 69 64 2f 2b 2f 23 a3 71 6f 73 00>
    }
    ### hgetallBuffer toReturn? [ { topic: 'device/id/+/#', qos: 0 } ]
robertsLando commented 1 year ago

@reijosirila I have just published a patch that at least should prevent broker to crash.

About the issue, the problem is that previously the subs where not encoded with msgpack, with version 9 the encode has been added so when decode is called it is called on a clean object (that was not encoded previously) and that causes the issue. Could you submit a fix to aedes-persistence-redis? You should just try to skip the decode when the object is already decoded here

reijosirila commented 1 year ago

Yep. Just finished checking addSubscriptions function and this is the case. I'll try to do PR.