Closed glebm closed 3 years ago
Hi @glebm,
It's not 100% clear yet whether this is a libserialport bug or a CH340 driver bug. Obviously the fact it works with pyserial is interesting and raises the possibility that libserialport is doing something wrong.
On the other hand, the problem only shows up with this particular device & driver. And libserialport doesn't do anything chipset specific - it uses the exact same calls into the standard Windows serial API regardless of the attached device. The driver is responsible for everything that happens beyond that. So any device-specific difference in behaviour would normally point quite strongly towards a driver bug.
The difference between pyserial and libserialport here may just involve the two making slightly different sequences of API calls - both of them valid - where one set triggers the driver bug whilst the other doesn't.
Either way, I think what would be needed to investigate further is to somehow capture the sequence of system API calls and results that are occuring in each test case:
I'm not sure how best that can be done these days - some brief reading suggests the ProcMon tool doesn't capture serial port calls, and PortMon is obsolete. I guess one would need to use a general purpose system/dll call tracing tool - there seem to be a few available but I'm not familiar with any of them.
I tried using DrMemory's dstrace (latest GitHub release) but the output is huge (perhaps you'll have better luck following it, attached).
dstrace results on CH340:
Interestingly, running DrMemory on my very simple program reports some bugs, such as UNADDRESSABLE ACCESS
:
Dr. Memory version 1.11.17856 build 1 built on Nov 27 2018 15:50:14
Dr. Memory results for pid 12476: "serial_send.exe"
Application cmdline: ".\serial_send.exe .\serial_constants.h COM3"
Recorded 115 suppression(s) from default C:\Users\Gleb\Apps\DrMemory\bin64\suppress-default.txt
Error #1: UNADDRESSABLE ACCESS beyond heap bounds: reading 0x0000000001b11c0c-0x0000000001b11c10 4 byte(s)
# 0 replace_memmove [c:\projects\drmemory\drmemory\replace.c:763]
# 1 libserialport-0.dll!? +0x0 (0x000000006d68896d <libserialport-0.dll+0x896d>)
# 2 KERNELBASE.dll!DeviceIoControl +0x65 (0x00007ffdbcd92766 <KERNELBASE.dll+0x32766>)
# 3 libserialport-0.dll!? +0x0 (0x000000006d6814df <libserialport-0.dll+0x14df>)
# 4 atexit [C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:439]
# 5 __tmainCRTStartup [C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:339]
# 6 .l_start [C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:223]
# 7 KERNEL32.dll!BaseThreadInitThunk +0x13 (0x00007ffdbfbd3034 <KERNEL32.dll+0x13034>)
Note: @0:00:00.516 in thread 5712
Note: refers to 0 byte(s) beyond last valid byte in prior malloc
Note: prev lower malloc: 0x0000000001b11b80-0x0000000001b11c0c
Note: allocated here:
Note: # 0 replace_malloc [c:\projects\drmemory\common\alloc_replace.c:2577]
Note: # 1 KERNELBASE.dll!DeviceIoControl +0x65 (0x00007ffdbcd92766 <KERNELBASE.dll+0x32766>)
Note: # 2 KERNEL32.dll!DeviceIoControl +0x7f (0x00007ffdbfbd3d30 <KERNEL32.dll+0x13d30>)
Note: # 3 libserialport-0.dll!? +0x0 (0x000000006d68979b <libserialport-0.dll+0x979b>)
Note: # 4 libserialport-0.dll!? +0x0 (0x000000006d6814df <libserialport-0.dll+0x14df>)
Note: # 5 atexit [C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:439]
Note: # 6 __tmainCRTStartup [C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:339]
Note: # 7 .l_start [C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:223]
Note: # 8 KERNEL32.dll!BaseThreadInitThunk +0x13 (0x00007ffdbfbd3034 <KERNEL32.dll+0x13034>)
Note: instruction: mov (%rcx) -> %ecx
Error #2: POSSIBLE LEAK 79 direct bytes 0x0000000001b101e0-0x0000000001b1022f + 0 indirect bytes
# 0 replace_malloc [c:\projects\drmemory\common\alloc_replace.c:2577]
# 1 ntdll.dll!RtlRestoreLastWin32Error +0x3f (0x00007ffdc06783d0 <ntdll.dll+0x83d0>)
# 2 msvcrt.dll!getptd_noexit
# 3 ntdll.dll!RtlRestoreLastWin32Error +0x3f (0x00007ffdc06783d0 <ntdll.dll+0x83d0>)
# 4 pre_cpp_init [C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:173]
# 5 msvcrt.dll!initterm
# 6 __tmainCRTStartup [C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:275]
# 7 .l_start [C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:223]
# 8 KERNEL32.dll!BaseThreadInitThunk +0x13 (0x00007ffdbfbd3034 <KERNEL32.dll+0x13034>)
===========================================================================
FINAL SUMMARY:
DUPLICATE ERROR COUNTS:
SUPPRESSIONS USED:
ERRORS FOUND:
1 unique, 1 total unaddressable access(es)
0 unique, 0 total invalid heap argument(s)
0 unique, 0 total GDI usage error(s)
0 unique, 0 total handle leak(s)
0 unique, 0 total warning(s)
0 unique, 0 total, 0 byte(s) of leak(s)
1 unique, 1 total, 79 byte(s) of possible leak(s)
ERRORS IGNORED:
11 potential error(s) (suspected false positives)
(details: C:\Users\Gleb\Apps\DrMemory\drmemory\logs\DrMemory-serial_send.exe.12476.000\potential_errors.txt)
3 potential leak(s) (suspected false positives)
(details: C:\Users\Gleb\Apps\DrMemory\drmemory\logs\DrMemory-serial_send.exe.12476.000\potential_errors.txt)
41 unique, 46 total, 7987 byte(s) of still-reachable allocation(s)
(re-run with "-show_reachable" for details)
Details: C:\Users\Gleb\Apps\DrMemory\drmemory\logs\DrMemory-serial_send.exe.12476.000\results.txt
This is pyserial's blocking read/write implementation:
Is it substantially different from libserialport's one?
The unadressable access is likely unrelated, because the arduino/ fork does not have it, yet still doesn't work
There is a known bug in wc_to_utf8 on Windows which was patched in the arduino fork, see https://sigrok.org/bugzilla/show_bug.cgi?id=1031. But there were some issues with their PR for those fixes (#23) that never got resolved, so the fixes haven't gone in. I expect that's what you're seeing. It would be really good to get that fixed properly!
I can't really see much useful from the call logs since there's no context or expansion of the arguments for each call.
Looking at the pyserial vs libserialport code however, the most obvious difference is that pyserial is calling ClearCommError before every single read. That could well be hiding problems.
@martinling Note that the outstanding PR (https://github.com/sigrokproject/libserialport/pull/1) for wc_to_utf8
does not fix this particular unaddressable access, so it's probably from some other arduino change, or perhaps https://github.com/sigrokproject/libserialport/pull/1 doesn't really fix the issue.
Looking at the pyserial vs libserialport code however, the most obvious difference is that pyserial is calling ClearCommError before every single read. That could well be hiding problems.
It does work though (the transmitted data is correct). I wonder why it's calling ClearCommError
. There is also win32.ResetEvent(self._overlapped_read.hEvent)
in there that's not in libserialport, could it be relevant?
Adding this below the timeout block around line 1014 of serialport.c
didn't work for me:
ResetEvent(port->read_ovl.hEvent);
DWORD errors;
COMSTAT comstat;
if (ClearCommError(port->hdl, &errors, &comstat) == 0)
RETURN_FAIL("ClearCommError() failed");
Also, simply opening something like Termie or Termite is enough to put the device into a state where it can be used with libserialport
.
On further inspection, you can also "kick" the port into working simply with this script:
#! /usr/bin/env python3
import serial
PORT = "COM3" # your port here
print("Messing with port " + PORT)
with serial.Serial(PORT, 9600) as ser:
print("Opened")
print("closed")
So whatever pyserial does in open and close is enough.
Digging in a bit more, in pyserial's open()
, if you modify it in serialwin32.py
(line 78):
return // Return here - the port isn't "fixed"
self._reconfigure_port()
return // But, if you return here, libserialport will subsequently work
So pyserial's _reconfigure_port()
does something important that libserialport does not in sp_open()
/set_config()
.
You can also add a return immediately from close()
to prove it is the open()
, not the close()
, that is important.
Turn out I was just missing the bits
setting after sp_open:
error = sp_open(port, SP_MODE_READ_WRITE);
if (error == SP_OK)
{
// Parity, baud, flowcontrol
sp_set_bits(port, 8); // Added this
}
On Linux, in set_config()
called during sp_open()
, when the device hasn't been opened yet, config->bits
is 8, on Windows it's 7 (this value comes from the get_config()
call)
Setting it formally, after sp_open()
returns, but before using the port allowed it to work.
@johnbeard Is this with CH-340?
I already set the flags explicitly like this (RETURN_IF_SP_ERROR
is a macro from my code, not the library):
#define SERIAL_PROTOCOL_BAUD_RATE 115200
#define SERIAL_PROTOCOL_BITS 8
RETURN_IF_SP_ERROR(sp_open(port, SP_MODE_READ_WRITE));
RETURN_IF_SP_ERROR(sp_set_baudrate(port, SERIAL_PROTOCOL_BAUD_RATE));
RETURN_IF_SP_ERROR(sp_set_bits(port, SERIAL_PROTOCOL_BITS));
RETURN_IF_SP_ERROR(sp_set_parity(port, SP_PARITY_EVEN));
RETURN_IF_SP_ERROR(sp_set_flowcontrol(port, SP_FLOWCONTROL_NONE));
@glebm It is a CH-340 (in the form of a cheap and nasty USB-serial cable). I suspect this might not be the same issue, but just has a similar symptom (works on Linux, but reads 0 on Windows sometimes).
Hi, can you please retry the current git HEAD of git://sigrok.org/libserialport and see if this is still reproducible? We've fixed a bunch of issue which may or may not have caused this.
Independently of that I've also tried to reproduce a problem on CH430/CH341, but so far everything seems to work as expected for me both on Linux and Windows, but certainly more in-depth testing is needed.
If possible, can you provide a minimal test-case Arduino project + libserialport C program (which doesn't depend on any other hardware than a CH340/CH341-based Arduino board or USB-to-serial board)? Then we could try to reproduce using that exact code on different OSes and libserialport versions/patches.
Just looking back over this issue and I noticed that @glebm's code sets other necessary parameters but doesn't call sp_set_stopbits
. Could the issue be that the Windows CH340 driver has something other than the common 1 stop bit as default? We already know it has 7 data bits as default from @johnbeard's comment above, so it wouldn't be too surprising. This would explain why using the port with another program first avoids the issue.
The API documentation has been extended a bit to clarify which settings are essential to set when opening a port: https://www.sigrok.org/api/libserialport/unstable/a00022.html#details
It looks like unset configuration parameters may also be the issue in https://github.com/iforce2d/thrustTester/issues/2 - see my comment there.
I'm closing this issue since I think it's been resolved - @glebm, I think your original problem may have been with you not calling sp_set_stopbits()
. If you can still reproduce this with all configuration parameters set, please reopen.
@martinling I don't have the hardware anymore (it's been ~2 years), thanks for looking into it
Devices with CH340 work just fine with libserialport on Linux, but not on Windows.
Python's pyserial package works for both Windows and Linux, leading me to believe the issue is with libserialport.
The issue doesn't happen with all chips, e.g. atmega16u2 works fine, but CH340 and CH341 don't.
This issue has been encountered by others, for example: https://github.com/iforce2d/thrustTester#arduino.
Environment:
Reproduction:
All reads and writes are blocking, with large timeout (3 seconds). The last read returns 0 (timeout) on Windows even though it shouldn't.
The exact same code on Linux works. pyserial implementation on Windows also works.
I didn't get very far with investigating this:
gdb
produces a segfault in a Windows syscall when opening the port but it doesn't result in a crash (not sure why). Nothing else that I could find but I know next to nothing about these things./cc @iforce2d @jamesljlster @facchinm