stephane / libmodbus

A Modbus library for Linux, Mac OS, FreeBSD and Windows
http://libmodbus.org
GNU Lesser General Public License v2.1
3.44k stars 1.74k forks source link

Behaviour of MODBUS_ERROR_RECOVERY_LINK #734

Open ssgnh opened 8 months ago

ssgnh commented 8 months ago

modbus_tcp模式下,客户端错误恢复模式设为以下, modbus_set_error_recovery(ctx, MODBUS_ERROR_RECOVERY_LINK | MODBUS_ERROR_RECOVERY_PROTOCOL); 在连接成功后,不停的向服务端发送信息,此时断开网线,modbus_read_registers()会返回-1,但是函数里调用的send会发送成功(发送到tcp发送缓冲里),这样情况能持续几分钟左右。此时将网线插上,发送缓冲区的大量数据会将服务端冲击崩溃,有什么好的解决办法吗?

目前我将MODBUS_ERROR_RECOVERY_LINK标志删除了,通过判断modbus_read_registers()返回3次-1来判断链路中断,感觉不是个好的方法。

JayHerpin commented 5 months ago

im seeing the same issue, but if I leave it disconnected long enough it actually never recovers

JayHerpin commented 5 months ago

Much of this is brought over from here: https://github.com/BlackZork/mqmgateway/issues/39

Steps to reproduce:

Computer and device are both connected via a simple switch. Run the configuration below. Disconnect the device from the switch (or cut its power). Wait for some period of time then reconnect the ethernet or restore power.

At this point the connection doesn't recover.

What is observed:

image

Notice the packet trace on the right. You can see at time 1713450984.766486 I reconnected the device, but at this point the sliding window never progresses further...

What should Happen:

After the configured responses timeout is hit (up to 3 times?) the TCP connection should be closed and re-opened.


Note that as this is happeneing, the kernel's tcp send buffer is continuously growing. I believe that the best thing to do is to close the TCP connection after the device is non-repsonsive for long enough if for no other reason as a way to purge and control the send buffer.

JayHerpin commented 5 months ago

I would be happy to work on the fix for this, however, I am only able to test on a small subset of platforms, and also don't have any serial devices to test against. I am a bit concerned about breaking other's setups by making a change in the core modbus.c logic.

I am almost inclined to have the tcp_flush function close and reopen the connection itself, but that feels a little "out there"

ssgnh commented 5 months ago

因为我发送的频率很快,断网后,内核缓存溢出很快,就会有Broken pipe错误。我测试过,断网后,调用的send函数依然会发送成功,导致断网重连不起作用。我换了种思路,modbus是一问一答,既然断网时问是成功发送的(实际发送到网络硬件了),但是答确不是。我通过多次接收不到回答数据,便认定断网,关闭socket,然后执行connect,这种思路类似于心跳,但挺好使。

3次返回-1是我自定义的,第一次返回-1时,我就发送相同的帧数据(避免传输丢帧导致不回复情况),也就是我发3帧相同数据依然不回复我,便认为通讯故障

ssgnh commented 1 month ago

目前这个测试的环境已经改变了,测试不了了,不过随着研究的深入,大概知道原因了,原因如下:

在当作客户端的情况下,libmodbus 里tcp 采用的是非阻塞模式,向从机发送消息帧时,send成功,libmodbus就认为发送成功了,查阅资料发现并不是,非阻塞情况下,send成功只能代表消息帧数据发送到底层缓冲区成功,网卡并不一定发送出去了,可能在网卡缓存里存着。

服务端(从机)异常断开时,客户端不知道,依然正常发送消息,send依然成功,此时数据帧并没有成功发送出去,而是缓存在内核底层(我个人非专业叫它存在网卡里),直到很长时间后(我这每轮发送间隔约100ms,每轮发送十几帧,几分钟填满缓存),缓存填满,会报“资源不可用”,后面可能还会报“pipe broken”,遇到过终止程序的情况。

这会有个问题,服务的异常断开后,客户端会一直send, 而且会成功,数据会在底层缓存,并未发出去,MODBUS_ERROR_RECOVERY_LINK起作用需要挺长时间,在起作用之前,如果服务端连接恢复,libmodbus端底层缓存的大量数据会一下子直接发送给服务端,服务端会死机。

MODBUS_ERROR_RECOVERY_LINK是依靠send返回错误来判断断联的,但是不是很有效。我本人采用方法是:请求帧发送后,不回复消息,累计3次后判断成断联。

010帅锅锅010 @.***

 

------------------ 原始邮件 ------------------ 发件人: "stephane/libmodbus" @.>; 发送时间: 2024年7月17日(星期三) 下午5:49 @.>; @.**@.>; 主题: Re: [stephane/libmodbus] Behaviour of MODBUS_ERROR_RECOVERY_LINK (Issue #734)

Could you test again to see if it's related to 9fb0283?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>