nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32
https://nodemcu.readthedocs.io
MIT License
7.64k stars 3.12k forks source link

UDP client problem #343

Closed eyaleb closed 8 years ago

eyaleb commented 9 years ago

I have a simple program to read temperature and report it. This is my IoT "Hello World" ...

All works well, reporting to mqtt is OK, or, as now, reporting to my own small server (in python) works too.

I recently decided to send the data over UDP to save some time (the program sleeps between reads so (activity) time is (battery) power). It does not work reliably.

conn = net.createConnection(net.UDP, 0)

conn:on("sent", function(conn)
        print ("sent")

        conn:close()

        print("sleep")
        node.dsleep(60*1000000)
end)

conn:connect(11883,"192.168.3.4")

conn:send ("store esp-test 1 Hello World")

print ("done")

Pretty straight forward. The problem is that the message does not arrive at the server. However, if I delay the sleep, say, by using

        tmr.alarm(1, 50, 0 function(conn)
            print("sleep")
            node.dsleep(60*1000000)
        end)

then it works. The '50' ms is enough time, but this is a bad hack. I hate it.

I did not find a way to do a clean UDP shutdown. There is no conn:on("disconnected. The "sent" seems to be called before the packet is actually sent. TCP, on the other hand, does have a proper disconnect and it will actually properly shutdown if I do not do it explicitly.

Any ideas anyone?

pastukhov commented 9 years ago

It is better to use timer, and you should close socket after send.

eyaleb commented 9 years ago

@pastukhov Can you elaborate?

pastukhov commented 9 years ago

The code:

tmr.alarm(0, 3000, 1, function()
    print("3 seconds pass")
end)

will print "3 seconds pass" every 3 seconds.

node.dsleep() causing node deep sleep and reboot mcu after wake up.

eyaleb commented 9 years ago

@pastukhov No, I am not asking what these function do, I already know.

You said "It is better to use timer"

You said "you should close socket after send".

In short, my question was very specific. My udp packet is lost if I sleep too soon. How do I know when it is safe to dsleep? I do not want to use a fixed time as it is not efficient and it can at times be too short.

cheers

pastukhov commented 9 years ago

There is no possibility to be sure with UDP. You can make some ack logic on the other side of your udp connection or use TCP.

eyaleb commented 9 years ago

It is not so. There is no handshaking and no ACK in UDP, meaning you do not know it the packet arrived at the receiver. No problem.

But my problem is that the message is not even transmitted. I only want to know that the packet got out (meaning it is not waiting in a buffer) and it is safe to shut down. Is there a "flush" call to the socket layer maybe?

pastukhov commented 9 years ago

At least this is working for me:

tmr.alarm(2, 1000, 1, function()
    conn = net.createConnection(net.UDP, 0)
    conn:connect(1234,"192.168.100.122")
    conn:send("1234")
    conn:close()
    conn = nil
    node.dsleep(5*1000000)
end)
cal101 commented 9 years ago

Moin,

the lua net module is a wrapper around the sdk callbacks. The lua sent callback is triggered because it is registered via sdk function espconn_regist_sentcb(pesp_conn, net_socket_sent);

So it looks to me as a SDK issue which can't be cleanly solved in lua alone.

Carsten

eyaleb commented 9 years ago

@pastukhov Nice try but no cigar.

First thing I needed to extend the initial alarm. It takes longer that 1 second to establish a wifi connection (which is needed for this code).

So this is what I run (u.lua):

tmr.alarm(2, 4000, 1, function()
    conn = net.createConnection(net.UDP, 0)
    conn:connect(11883,"192.168.3.4")
    local msg = "show esp-06 "..1
    print("sending "..msg)
    conn:send(msg)
    conn:close()
    conn = nil
    node.dsleep(1*1000000)
end)

[in real life the message uses a counter instead of '1']

It works intermittently if I just dofile("u.lua") directly after a restart.

I then made it loop by installing this init.lua:

local function checkMagic()
        local magic_pin = 1     -- gpio5

        gpio.mode (magic_pin, gpio.INPUT, gpio.PULLUP)
        if 0 == gpio.read (magic_pin) then
                print ("aborting by magic")
                return true
        end

        return false
end

if not checkMagic() then
        dofile("u.lua")
end

The "magic" is a simple way to abort the loop so that I do not need to re-flash each time. In reality it simply checks to see if gpio5 is grounded, and only if it is not it runs our sample program.

I see about three messages arriving out of every 10 "sent".

So my questions are:

Anyone else tested this - what did you get?

TIA

eyaleb commented 9 years ago

I am reviving this item since it was not attended to properly (and not fixed so far).

[BTW I am running today's master branch. The extra RAM is great and I do not notice any slowdown so far. I can read the ds18b20 ow device reliably.]

The problem is that shutting down the program will fail to empty the UDP buffer, meaning any unsent UDP packet is lost (never sent). This is not the same as a sent packet getting lost, something UDP allows.

Here is the simple code I use (which fails to send)

        local conn = net.createConnection(net.UDP, 0)
        conn:connect(savePort, saveServer)
        conn:send (format_message())
        conn:close()

        node.dsleep(sleep_time)

To make it work I change it to delay the sleep:

        local conn = net.createConnection(net.UDP, 0)
        conn:connect(savePort, saveServer)
        conn:send (format_message())
        conn:close()

        tmr.alarm(1, 50, 0, function()  -- UDP needs grace period
                tmr.stop(1)
                node.dsleep(sleep_time)
        end)

This is a bad way of handling the situation. 50ms is a made up number (not good).

conn:on("sent") triggers fast when nothing was actually sent.

I am missing some facility, for example

Without the change the program looks like working, but the server (receiver) never sees any of the packets. Introducing a delay makes more of them show up until at 50ms practically all arrive.

TIA

sq6emm commented 8 years ago

Hi,

I have exactly the same problem but with TCP connection. Have you found a solution for your problem?

eyaleb commented 8 years ago

@sq6emm, unfortunately no, I am waiting to see how the 1.x series performs. My kludge is to insert a 20-50ms delay before the close (as in the code shown above). The longer the delay the less packets are lost.

cheers

TerryE commented 8 years ago

@eyaleb @sq6emm This isn't a kludge. It's how the SDK works. The mistake here is in your understanding. Please read my Unofficial FAQ. To quote this:

Non-Lua processing (e.g. network functions) will usually only take place once the current Lua chunk has completed execution. So any network calls should be viewed at an asynchronous request. A common coding mistake is to assume that they are synchronous, that is if two socket:send() are on consecutive lines in a Lua programme, then the first has completed by the time the second is executed. This is wrong. Each socket:send() request simply queues the send operation for dispatch. Neither will start to process until the Lua code has return to is calling C function.

In your first example you are scheduling a sk:connect(), sk:send (), sk:close() and a node.dsleep() from the same task. The SDK does not guarantee that these will be scheduled in this order and quite frankly I am surprised the even the second works reliably. You should do the send from an on('connection') callback; ditto the close from an 'sent' one and the sleep from a 'disconnection' one.

Maybe I need to hoist my Unoffical FAQ to "official status".

This isn't really a bug with the firmware and it's more of a design feature of the SDK, so I hope that you don't mind my closing this issue from the firmware perspective.

eyaleb commented 8 years ago

@TerryE, I understand what you say but life is not this simple. For UDP connections some of the usual callbacks are not implemented, so there is no way to properly serialise actions. For example, there is no on:connect(). The on:send() returns immediately (before anything is actually sent) which does not help either. I did try all possible combinations.

My TCP code properly uses the available callbacks and it does not exhibit this problem.

This is why I ask for either proper callbacks or another method to track the progress on a sent packet.

Eyal

TerryE commented 8 years ago

@eyaleb. Well argued. UDP is stateless with unassured delivery. This does complicate the callback framework. I will take a look and report back.

ppmkm commented 8 years ago

@eyaleb I abosultely agree that there is a need for proper call back to serialize calls. Just want to share my quite reliably working kludge: sending of UDP packet within a DNS callback on the udp connection.

 udp = net.createConnection(net.UDP,0)
 udp:connect(port,ip)
 udp:dns("some_known_host",function(sck,ip)
        udp:send(json)
        udp.close(udp)
 end)

Piotr

eyaleb commented 8 years ago

This issue came up again on the esspressif bbs: http://bbs.espressif.com/viewtopic.php?f=7&t=1531

nickandrew commented 8 years ago

The same issues arise in general purpose operating systems (e.g. Linux) in which a file written, then closed, is possibly not written to disk and is sitting in cache memory or the filesystem layer is still writing data structures to disk or the data has been sent to the disk drive but is not yet committed onto the platter. This is not an easy problem to solve.

NodeMCU depends on the functionality provided by the SDK and if an appropriate callback is not available, nothing can be done except add a heuristic delay after sending and before sleeping.

If Espressif can add a callback, it seems to me that it should be something like "radio buffer empty", to signify that the WiFi radio has no more work to do.

eyaleb commented 8 years ago

if Espressif can add a callback

This is what is requested on their bbs.

nickandrew commented 8 years ago

@eyaleb There's nothing NodeMCU can do until the SDK adds support, if ever, so do we need to keep this open?

eyaleb commented 8 years ago

This is sad but yes, it can be closed.

I now have a 30ms (timer) wait before going to sleep, which is significant when my app completes in about 300ms.

nickandrew commented 8 years ago

Think of it this way, if and when a callback is added, it'll be a few ms before it fires anyway. Your delay is needed because the UDP packet doesn't go out the air immediately. So 30 ms is not all wasted.