stephane / libmodbus

A Modbus library for Linux, Mac OS, FreeBSD and Windows
http://libmodbus.org
GNU Lesser General Public License v2.1
3.33k stars 1.71k forks source link

Program breaks after many hours #147

Closed Tangerino closed 10 years ago

Tangerino commented 10 years ago

Hello.

I'm chasing a problem in my code for a month. It breaks with no further message, just get a glib corruption. The software is running in an 256MB Debian machine, where valgrind is not supported. So I installed a debian VM with low resource as possible, just to mimic my box architecture. It also breaks, but now I finally got and error message from valgrind. as in: ==23256== 1 errors in context 1 of 266: ==23256== Thread 17: ==23256== Invalid read of size 4 ==23256== at 0x805FE56: check_confirmation (modbus.c:482) ==23256== by 0x80616A3: write_single (modbus.c:1143) ==23256== by 0x41FFFFFF: ??? ==23256== Address 0x4268320b is not stack'd, malloc'd or (recently) free'd

Does anyone has a clue, history, anything to help me out?

Carlos Tangerino carlos.tangerino@gmail.com

AlexMazalov commented 10 years ago

Hi Carlos! A proper fix for this problem: https://github.com/stephane/libmodbus/commit/2c132b040643e209c1978dfd660702f71aa76fce.

stephane commented 10 years ago

Not sure the fix pointed by @AlexMaz is the right one, but the bug report is not detailed enough to be sure. We could be sure if you log you requests with modbus_set_debug(ctx, TRUE).

Tangerino commented 10 years ago

All right. I'll set the debug on and let you know. The problem is the program run for many ours and then crash. Also the log capacity of the box is not that huge. I'll drop the output devices (write registers) and run the tests in parallel. Merci d'avance

Carlos Tangerino carlos.tangerino@gmail.com

On Oct 12, 2013, at 11:32 PM, Stéphane Raimbault notifications@github.com wrote:

Not sure the fix pointed by @AlexMaz is the right one, but the bug report is not detailed enough to be sure. We could be sure if you log you requests with modbus_set_debug(ctx, TRUE).

— Reply to this email directly or view it on GitHub.

stephane commented 10 years ago

Which version do you run?

Tangerino commented 10 years ago

Hello Stephane.

I was using the V3.03, then I update to 3.1.0 but the problem still exist. Should I keep the little code change from the last e-mail? So I start looking around and replace my sqlite3 library as well for the latest one. The program now is running for more than 24 hours with not a single problem, even under valgrind, ZERO problems reported so far. Coincidence or not I think I isolate the problem. Thanks for your help.

QUESTION: What version should I use I'm my project? It is still under development and I can change anything.

A+

Carlos Tangerino carlos.tangerino@gmail.com

On Oct 14, 2013, at 11:51 AM, Stéphane Raimbault notifications@github.com wrote:

Which version do you run?

— Reply to this email directly or view it on GitHub.

stephane commented 10 years ago

I recommend you to use libmodbus v3.1.1 (Linux), IMHO more robust and stable (under Linux) than v3.0.5

I don't understand which version are you using for your current run (for 24 hours)?

Tangerino commented 10 years ago

Hi Stephane.

I'm using the latest version, without the fix I got before in the 'write_single' function. And then it happen again. Coincidence or not, after few hours I got this message from 'valgrind' ==10062== Invalid read of size 4 ==10062== at 0x8060062: check_confirmation (modbus.c:482) ==10062== by 0x80618AF: write_single (modbus.c:1143) ==10062== by 0xFF0000BF: ??? ==10062== Address 0xe0 is not stack'd, malloc'd or (recently) free'd

It is hard to log everything because I have limited resources in the box. I changed the two lines of code in hope it will fix the problem. Do you agree it is a potential problem? uint8_t req[MAX_MESSAGE_LENGTH];

req_length = ctx->backend->build_request_basis(ctx, function, addr, value, req);

rc = send_msg(ctx, req, req_length);
if (rc > 0) {
    /* Used by write_bit and write_register */
    uint8_t rsp[MAX_MESSAGE_LENGTH];

I hope the const int offset = ctx->backend->header_length; in the modbus_receive_confirmation function come from a byte (from the protocol buffer), so I'm safe. No conclusion so far, just trying to track down the problem. My program runs for hours but I do have a Indian device in the network that I don't trust at all, but it is there, and I have to supports it. Also, I think it is a good idea to lock the variables from the protocol to its maximin size, as it came from a byte, so cast it to a uint8_t not to int.

Carlos Tangerino carlos.tangerino@gmail.com

On Oct 14, 2013, at 4:43 PM, Stéphane Raimbault notifications@github.com wrote:

I recommend you to use libmodbus v3.1.1 (Linux), IMHO more robust and stable (under Linux) than v3.0.5

I don't understand which version are you using for your current run (for 24 hours)?

— Reply to this email directly or view it on GitHub.

stephane commented 10 years ago

The latest version (v3.1.1 includes the fix) so if the fix resolves your problem, we can close this ticket. Could you confirm please?

Tangerino commented 10 years ago

Yes, please. And many thanks

On Mon, Oct 21, 2013 at 12:43 PM, Stéphane Raimbault < notifications@github.com> wrote:

The latest version (v3.1.1 includes the fix) so if the fix resolves your problem, we can close this ticket. Could you confirm please?

— Reply to this email directly or view it on GitHubhttps://github.com/stephane/libmodbus/issues/147#issuecomment-26707285 .

* Merci beaucoup, Muito obrigado, Thanks

Carlos Tangerino Mobile: +33 6 82 05 55 18 Work: +33 4 76 60 53 56 Home: +33 9 51 04 91 81 *

stephane commented 10 years ago

Thank you for your feedback.