nRF24 / RF24Mesh

OSI Layer 7 Mesh Networking for RF24Network & nrf24L01+ & nrf52x devices
http://nrf24.github.io/RF24Mesh
GNU General Public License v2.0
421 stars 154 forks source link

Potential mesh.checkConnection() improvement #249

Closed TMRh20 closed 3 weeks ago

TMRh20 commented 3 weeks ago

Right now mesh.checkConnection() is a bit intensive on the network and especially the master node, because each node will, in turn, check its connection which requires an address lookup + reception + response from the master node.

I am thinking about a change to the general checkConnection() function as follows:

template<class network_t, class radio_t>
bool ESBMesh<network_t, radio_t>::checkConnection()
{
    RF24NetworkHeader header;
    header.to_node = network.parent();
    header.type = NETWORK_PING;
    for(uint8_t i = 0; i < MESH_CONNECTION_CHECK_ATTEMPTS; i++){
      if(network.write(header, 0, 0)){
        return 1;
      }
    }
    return 0;

}

Essentially, each node would only be contacting their parent node directly with an auto-ack message. This way if say top nodes like 05 and 04 lose their connection for a short time, it could potentially take a bit longer for the mesh to partially fall apart in some cases, then reconverge. This might be a good thing, because other nodes should take up the top spots in a short period of time.

This will make it much easier for distant nodes to verify their connectivity as well.

This would leave it up to users to call getAddress(); if they want the old method of connection checking. The whole theory here is to make the mesh less dependent on the master node.

In any case, this is just up for discussion right now.

2bndy5 commented 3 weeks ago

This would definitely be acceptable.

Should it also safeguard from using checkConnection() when mesh_address == 04444? I'm guessing that such a condition would still return accurate info under the given proposal.

The current implementation has an added prevention about calling checkConnection() on the master node embedded into getAddress(). This prevention should probably be added the proposal here as well.

The docs would need to be clarified as well. Something like:

checkConnection() only pings the current node's parent to avoid network/bandwidth congestion. If validation should depend on a response from the master node, then getAddress() can be used instead:

if (mesh.getAddress(nodeID) < 0) {
   // not connected
}

Even though _node_id is currently a private member, user code should/can be aware of the node's ID since that is supposed to be managed by the network admin anyway.

TMRh20 commented 3 weeks ago

Should it also safeguard from using checkConnection() when mesh_address == 04444?

Yes, it should return immediately. Good call.

...added prevention about calling checkConnection() on the master node

Yes that is probably a good idea as well

I will do some more testing before putting in a PR, but I think this would be a good change.

2bndy5 commented 3 weeks ago

This way if say top nodes like 05 and 04 lose their connection for a short time, it could potentially take a bit longer for the mesh to partially fall apart in some cases, then reconverge

I especially like this bit. It seems much better suited to the name "mesh".

TMRh20 commented 3 weeks ago

Wondering if we should do a define or something, so the old behaviour is easier to get to?

Like #define RF24MESH_CONN_CHECK_TYPE and let advanced users select the old methodology if required. Its easy to add a few lines of code, but its easier to un-comment a single line.

2bndy5 commented 3 weeks ago

It is a good idea. Since it is a binary toggle, I think we could further eliminate the magic number used as a macro value

#define RF24MESH_CONN_CHECK_PARENT 1
#define RF24MESH_CONN_CHECK_MASTER 0
#define RF24MESH_CONN_CHECK_TYPE RF24MESH_CONN_CHECK_PARENT
// To use old behavior: 
// #define RF24MESH_CONN_CHECK_TYPE RF24MESH_CONN_CHECK_MASTER

Then in function body

#if RF24MESH_CONN_CHECK_TYPE == RF24MESH_CONN_CHECK_PARENT
// New behavior
#else 
// Old behavior 
#endif
TMRh20 commented 3 weeks ago

ConnTime2

The above chart shows how long a group of 10 nodes in my radio network have been connected to the MQTT server, as interference and re-organization of the mesh interferes with this and nodes will disconnect/reconnect from time to time.

All nodes are reporting in decimals (hours) except for nodes 13, 5 and 14 which are in minutes. Node 2, 8, 9, 11, 16 and 240 are all running the updated checkConnection() function.

As can be seen, the only node with a PA+LNA module, node 2, has been connected for 12.22 hours, with no loss of connectivity to the MQTT server. The next best is node 11 with 5.35hrs. This is on a very busy mesh with 17 nodes reporting in every few seconds.

The following is over a number of hours of operation showing the success rate of pinging a RPi5 close to the master node. PingGuage2

Its starting to look like a significant improvement in overall connection time and connectivity in general.