nRF24 / RF24Mesh

OSI Layer 7 Mesh Networking for RF24Network & nrf24L01+ & nrf52x devices
http://nrf24.github.io/RF24Mesh
GNU General Public License v2.0
424 stars 152 forks source link

Unlisted nodes #27

Closed Avamander closed 9 years ago

Avamander commented 9 years ago

When I reset my master node, some nodes do not reappear the the connected node list, but do send packets to the master. This causes multiple issues. Could you please add a safeguard to block all nodes that are not registered to the mesh? Also, what would happen if two mesh networks get in range of each other?

TMRh20 commented 9 years ago

Apologies for the delayed reply, I have had limited access to the internet and my equipment.

When using RPi/Linux devices as a master node, the list of connected nodes will be automatically saved as nodes connect, and re-used if the master node is restarted.

With Arduino devices, this is left entirely up to the user to manage. Since the system is wireless, there is really no way for the master to 'block' nodes that are not registered. ie: If a non-registered node uses an address that is already taken, there will be communication problems.

I experimented with using the EEPROM to store information, but one problem there is that not all Arduino devices even have an EEPROM (like Arduino Due), so multiple solutions are required.

Possible Solutions:

  1. Use an SD card, and save the RF24Mesh::AddrList to file before shutting down or whenever a node is added/removed from the list.
  2. Use EEPROM similarly to SD card. (EEPROM is limited to 100,000 read/write cycles so has limits)
  3. Hardcode a custom RF24Mesh or RF24Network response to be sent whenever an unregistered node sends a message. This would force the unregistered node as well as any registered nodes using the same address to renew their address.

The third option may work, but would require a bit of experimentation to be sure and to get it right. Any other thoughts/ideas/suggestions?

Avamander commented 9 years ago

The third one seems the best idea. Altho my problem was caused by nodes that think they are registered, I solved it by adding a manual ACK sent to the node what sent the master a packet, if the lookup of rf24mesh failed, the ACK is not sent and the node realises that, it reconnects itself to the mesh.

baranoveyy commented 9 years ago

@Avamander can you share the code that you solved this problem? I used arduino due and i had same issue

TMRh20 commented 9 years ago

Just FYI I've been thinking about this problem, although I haven't had much time to work on it, and I realized that option 3 is not really a good option. The problem is that if two nodes attempt to communicate normally, using the same address, it can interfere, preventing either node from working properly. Theoretically, it could cause a mess of communication problems, because of auto-ack.

Right now there are a number of problems with mesh assignment, renewal:

On RPi, the address assignments are saved in binary format in a file (dhcplist.txt) a : Nodes are not deleted from the list, unless they release their address b: With my configuration, all nodes are set to verify their connectivity every 30 seconds, via mesh.checkConnection(); c: Simple refresh: Stop the master node for 30 seconds and delete dhcplist.txt. All nodes will renew their address, and the network will rebuild itself completely.

On Arduino, address assignments are not saved in any way a: The same method as above (c) can be used to refresh the network b: Saving addresses to EEPROM or SD would allow the same functionality as RPi

The main problem currently, is that nodes need to regularly check to ensure connectivity to the network, and there is no reliable way (multicast comes to mind) to notify lost nodes that they need to renew their addresses that I am aware of.

Overview a: Nodes should never be unlisted/lost unless the master loses track of their registration b: Only happens when master restarts (Arduino) or when dhcplist.txt is deleted (RPi) c: Once a node is lost, attepts to communicate normally may interfere with other nodes

Users can manage this in multiple ways, depending on configuration: a: Periodic polling ie: every 30 seconds (see above) b: Manually save the address list to a file or EEPROM, restore upon restart c: Secondary master - Send the address list to a 'backup' node with a 'static' address assignment d: Multicast - Have the master node broadcast a user-defined payload type that triggers address renewal (may cause problems?) e: ???

So far I don't really have a good solution figured out, but I manage it through periodic polling, as the intent of the mesh was for the nodes to constantly maintain connectivity.

Avamander commented 9 years ago

I have the same solution, I too poll the Master every minute. Would it be possible that you'd add this feature to the library so that it can be enabled when someone wishes to do so?

TMRh20 commented 9 years ago

The feature, as I use it, is contained in the library:

  if(millis()-mesh_timer > 30000){ //Every 30 seconds, test mesh connectivity
    mesh_timer = millis();
    if( ! mesh.checkConnection() ){
        Serial.println("*** RENEW ***");
        //refresh the network address        
        mesh.renewAddress();

     }else{

        Serial.println("*** MESH OK ***");
     }
  }

Then, if restarting the master node, I can just leave it offline for 30+ seconds and all nodes will refresh their addresses.

catan85 commented 9 years ago

I tried that solution in the mesh nodes but when they check connection, they're always ok.. In the master node if I use mesh.getNodeID I always receive "-1" response... What can I do?

Avamander commented 9 years ago

You are doing it wrong. You need to make your own checking system that makes sure the node in addition to being connected is also registered.

Make your node send the master a packet, use node lookup on your master and try to send a reply, if it times out you should make the node reconnect.

catan85 commented 9 years ago

But that is what I do, the slave send packet, the master received the packet, but in the list the master can't find the node id of the slave. But if on the master I respond directly to the slave, without lookup for the nodeid then it works correctly... Maybe I have to adjust the nodeid list myself on the master?

TMRh20 commented 9 years ago

@catan85 If you test with the included RF24Mesh_Example.ino , all nodes should re-register themselves if the master node is left offline for at least 30 seconds. This is exactly the system I use with all of my nodes.

@Avamander

You are doing it wrong. You need to make your own checking system that makes sure the node in addition to being connected is also registered.

Technically, the process of verifying registration will cause more problems than it solves if you are using a large(ish) number of nodes. (The problems begin when an unregistered node is using the same address as a registered node)

Following the simple system of having the nodes check the connection at a defined interval, and ensuring the master is offline for at least this length of time will ensure that all able nodes remain registered.

catan85 commented 9 years ago

I checked the samples but I cannot find the re-registration after 30 seconds timer. I see only a mesh.renewAddress(); when the message sending fails. Anyway , I tried to insert the re-registration lines you wrote in the current post on the slave nodes. The problem is that the slave always return "* MESH OK *" but, in the master I cannot get the node_id from the list mesh.addrList[i]. The entire list is empty.. Is that normal or not? Perhaps something is wrong on my libraries installation?

Maybe I can put the node id of the slave in a packetm send it to the master and then on the master update the list manually?

TMRh20 commented 9 years ago

I checked the samples but I cannot find the re-registration after 30 seconds timer.

@catan85 Ahh, thanks for pointing that out, I'm looking at RF24Ethernet examples lol. The RF24Mesh example needs to be updated to match the following I guess:

  if(millis()-mesh_timer > 30000){ //Every 30 seconds, test mesh connectivity
    mesh_timer = millis();
    Serial.println("Mesh check");
    if( ! mesh.checkConnection() ){
        //refresh the network address       
        Serial.println("Renew"); 
        mesh.renewAddress();
     }
  }

If the entire list is empty, then your nodes are not registering properly to begin with. This can be tested by adding the following around mesh.begin():

if(!mesh.begin()){
  Serial.println("Failed to register with master node");
}

The same can be done with the renewAddress() function.

If not working at all, I usually recommend testing with the core RF24 examples to ensure your modules are working properly.

catan85 commented 9 years ago

Is it good to place a while loop in the sketch setup? void setup() { while(!mesh.begin()) { Serial.println("Failed to register with master node"); delay(1000); } } Thanks for your support!

TMRh20 commented 9 years ago

For now it might be, but I am planning on reverting the default timeout to something like 60 seconds instead of the current 3 seconds, and making it configurable via define in RF24Mesh_config.h. Also, per a suggestion by @Avamander to add a timeout option to mesh.begin(); eventually.

catan85 commented 9 years ago

Finally I solved my problem. When the slave send the message to the master it always answer send ok. If the master is resetted it lost the address list. At this point, when the master receive the message from the slave, but the slave is not in the list, the master send back a "reset" message. When the slave receive a "reset" message it execute mesh.setnodeid(x) then mesh.begin. When the slave retry to send the normal message it is disconnected and renew his address. Then the communication is again working. If you want you can see my code here: node: http://pastie.org/10442044# master: http://pastie.org/10442047#

TMRh20 commented 9 years ago

@catan85 & @Avamander Thanks to both of you for your input on this. I just corrected a minor issue with unicast writes since I was looking over this issue again.

Just to reiterate, a 'reset' message is kind of a bad idea, because if another node happens to be assigned the address of the unregistered node, a number of problems will occur, including potential data corruption and communication issues between affected nodes.

My best suggestion if using this method would be to use multicast, which should reduce the issues, at least when the master is sending back a response:

void SendReset(int node)
{
    dbgSerial.println("Sending renew request");
    RF24NetworkHeader resheader(node, OCT); //Constructing a header
    resheader.type = 'R';
    // network.write(resheader, &c, sizeof(c));
    network.multicast(resheader,&node,sizeof(node),4);
    delay(3000);
}
while (network.available()) {
    RF24NetworkHeader header;
    payload_t payload;
    //network.read(header, &payload, sizeof(payload));
    network.peek(header);
    switch(header.type){
      // Display the incoming millis() values from the sensor nodes
    case 'C':
        network.read(header, &payload, sizeof(payload));
        Serial.print("Received C");
        Serial.print(payload.device);
        Serial.print(" = ");
        Serial.println(payload.value);
        temp_val = payload.value;
        break;
    case 'R':
        uint16_t node; 
        network.read(header, &node, sizeof(node));
        if(node == mesh_address){
           Serial.print("Received R");
           mesh.renewAddress();
        }
        // mesh.setNodeID(nodeID);~~
        // mesh.begin();~~
        break;
    }
  }

This is still likely to cause problems, if another node is assigned the same RF24Network address before the 'lost' node is sent a reset message, but should cause 'less' problems, because the reset from master is not using acks.

Explanation:

If the master node receives a payload from an unregistered node, it sends an R type message containing the RF24Network address of the node in question. This message is multicast to all connected nodes, which check the included RF24Network address, and request a new one if they are listed.

catan85 commented 9 years ago

Got it, I'll try this evening! Thanks for your suggestions!

catan85 commented 9 years ago

I tried but I cannot compile the node source.. It doesn't find mesh_address declaration.. it says that it is not declared in this scope, but in RF24_Mesh.h I can see the mesh_address declaration under public..

catan85 commented 9 years ago

ok, I replaced it with mesh.mesh_address but the slave never receive the message. Is it necessary to read in different mode when the message is multicast?

Agam-aviconn commented 7 years ago

Hi, I am also facing the same issue with Arduino. When my master NRF resets, it looses its DHCP entries.Also sometimes all the nodes are not there in the DHCP list. I know that DHCP list is not saved in Arduino. And even if I save the list, then a lot of mix up will occur regarding whether the address is occupied by any other node. @TMRh20 has already commented one solution above.

If the master node receives a payload from an unregistered node, it sends an R type message containing the RF24Network address of the node in question. This message is multicast to all connected nodes, which check the included RF24Network address, and request a new one if they are listed.

Can anyone share the implemented code for that?

In my case when master sends some query, only then the slave responds.So if my slave is not there in the list of master how will the master send message to that slave. If I use above algorithm as mentioned by @TMRh20 , then how will my master send 'R' type to slave if slave is not there in the list of master.

And in multicast mode, will all the nodes hear the message sent by the master? Kindly reply as I am stuck in product development.

Avamander commented 7 years ago

@Agam-aviconn

In my case when master sends some query, only then the slave responds.So if my slave is not there in the list of master how will the master send message to that slave. If I use above algorithm as mentioned by TMRh20, then how will my master send 'R' type to slave if slave is not there in the list of master.

Interpret the absence of reply to the heartbeat as being disconnected and then deal with it accordingly.