Fix sp_nonblocking_write writing only one byte, and fails.

martinling / libserialport

Unofficial personal repository for libserialport - see git://sigrok.org/libserialport for official repo

http://sigrok.org/wiki/Libserialport

GNU Lesser General Public License v3.0

65 stars 34 forks source link

Fix sp_nonblocking_write writing only one byte, and fails. #21

Closed tomay3000 closed 7 years ago

tomay3000 commented 7 years ago

This is a reimplementation of sp_nonblocking_write with minor changes, where it only writes one byte to the serial port and returns. This reimplementation drops the use of the HasOverlappedIoCompleted macro and replace it with GetOverlappedResult which is the best choice according to the MSDN documentation.

GetOverlappedResult bWait [in] If this parameter is TRUE, and the Internal member of the lpOverlapped structure is STATUS_PENDING, the function does not return until the operation has been completed. If this parameter is FALSE and the operation is still pending, the function returns FALSE and the GetLastError function returns ERROR_IO_INCOMPLETE.

HasOverlappedIoCompleted Remarks Do not call this macro unless the call to GetLastError returns ERROR_IO_PENDING, indicating that the overlapped I/O has started.

and also instead of doing a loop for a byte-by-byte nonblocking write which is causing the failiure of the function, a bulk write according to the requested count number of bytes is done once.

martinling commented 7 years ago

Hi Tomay,

Can you explain what problem/failure you were having with the previous implementation?

The usage of HasOverlappedIoCompleted is correct in this case, because it is only called where the port->writing flag is set. This flag is only set when a previous overlapped write has been started on the port and has returned ERROR_IO_PENDING. So the precondition for the macro is met.

There are a couple of problems with your rewritten version:

It's not safe to start an overlapped write for the whole buffer and then return. The caller of sp_nonblocking_write is allowed to free or reuse the buffer after calling, but WriteFile requires that the buffer be left alone until the write operation has finished. The OS may need to read from it at any time during the operation.
Starting an overlapped write for a block of data will generally always succeed; it doesn't tell us anything about when the write buffer for the port has filled up. This is the reason for the byte-by-byte operation: writes will succeed immediately until the buffer is full, then will start pending. Basically it's trying to emulate the behaviour of write with O_NONBLOCK on Unix: it writes bytes into the kernel's buffer for that file descriptor, stopping when that buffer is full.

This function was the trickiest one to design in libserialport, because of the fundamental differences between Windows asynchronous I/O and Unix nonblocking I/O. We had to choose which model to base the libserialport API on, and emulate it where the OS used the other. If we'd gone with the async model it would have required major ugly work on the Unix side, including spawning threads to complete requests, which wasn't a side effect we wanted from a simple serial library. It was a lot simpler, if ugly, to emulate nonblocking I/O on top of async in the Windows case.

tomay3000 commented 7 years ago

Hi Martin Ling, I have tried your version with more than 10 times, and it always writes only the first byte and returns, and one consideration that should be taken with your version is that it should be used in a loop to guarantee that all bytes were written.

My version is very simple and uses the kernels threads to complete the operation in the background and returns immediately so you can do other work, and in any case if the user wants to cancel the pending I/O operation, he can use CancelIoEx (starting from Windows Vista) to cancel the pending operation.

If a user tries to reuse an OVERLAPPED structure for Read/Write operations while an overlapped operation with that same OVERLAPPED is still running in kernels thread, so: GetOverlappedResultwill return FALSE and GetLastError will return ERROR_IO_INCOMPLETE because the previous operation is still running. (as you did in restart_wait)

Since ReadFile and WriteFile uses different OVERLAPPED structures which are read_ovl and write_ovl receptively, so Read and Write operations are independents.

Also, I have seen many people complaining about sp_nonblocking_write is writing only the first byte and returns.

as for the UNIX version, it is just a code trimming and formatting.

Thank you for your understanding.

martinling commented 7 years ago

Hi Tomay,

The problem with your version is not with the reuse of the OVERLAPPED structure, it's with the reuse of the buffer that the data is coming from.

Consider this program that sends the letters A to Z, using a nonblocking write so that it can do some other work when the port isn't ready for more data yet.

int send_a_to_z() {
    char buffer[26];
    for (int i = 0; i < 26; i++)
        buffer[i] = 'A' + i;
    int bytes_sent = 0;
    while (bytes_sent < 26) {
        int result = sp_nonblocking_write(port, buffer + bytes_sent, 26 - bytes_sent);
        if (result < 0)
            return ERROR;
        else if (result == 0)
            /* Buffer is full - do something else while we wait. */
            do_something_else();
        else
            bytes_sent += result;
    }
    return OK;
}

Now, think what can happen if this program is run using your implementation. An overlapped write request for 26 bytes will be started, and sp_nonblocking_write() will return 26, to indicate that all bytes were accepted. Now bytes_sent is updated to 26, the loop exits, and the function returns OK.

But the buffer was a temporary variable allocated on the stack - so as soon as the send_a_to_z function exits, that memory may be reused by whatever code runs next. By the time the kernel reads the rest of the characters, they may be different.

tomay3000 commented 7 years ago

Hi Martin Ling, acourding to this MDSN links:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa365747(v=vs.85).aspx

lpBuffer [in] A pointer to the buffer containing the data to be written to the file or device. This buffer must remain valid for the duration of the write operation. The caller must not use this buffer until the write operation is completed.

and https://msdn.microsoft.com/en-us/library/windows/desktop/aa365467(v=vs.85).aspx

lpBuffer [out] A pointer to the buffer that receives the data read from a file or device. This buffer must remain valid for the duration of the read operation. The caller must not use this buffer until the read operation is completed.

here MSDN says that it is your responsibility to maintain the buffer valid during the operation.

You are right about what you said, but you forgot that it affects both WriteFile and also ReadFile. and sp_nonblocking_read does not implement your logic.

In my implementation, it is the user responsibility to keep the buffer valid for the duration of the operation. and there are options and considerations to do this: --- make the buffer variable in the global scope. (and you can see in the source code that all write operations will fail if a previous write is already running (incomplete)). --- create the buffer in the heap and free it after the write operation competes. --- call sp_drain right after sp_nonblocking_write (which in turns calls FlushFileBuffers). --- call sp_wait with SP_EVENT_TX_READY mask right after sp_nonblocking_write. --- reimplement CreateFile with FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags (not tested, I will test it later). it is described here: https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx

Thank you for your understanding.

martinling commented 7 years ago

Hi Tomay,

The implementation can't change what the user's responsibilities are; this is a library with a published stable API. If we change how an existing function behaves, we can break existing programs that are using the library.

If we'd designed the function that way from the start and stated that buffer requirement in the documentation, then your implementation would be OK on Windows. But we wouldn't be able to implement the same behaviour on Unix-based systems, which don't have the same asynchronous I/O mechanism (there's POSIX AIO, but it isn't implemented on many systems at all, and I don't think any OS implements it for serial ports). The only way to do the write in the background would be to spawn a new thread to do it, which is getting way too complex and intrusive for a simple serial port library.

The same issues apply if we were to add your version as a new function - I agree it's a useful behaviour to have, but anything that we provide in the libserialport API needs to be implementable on all operating systems.

There's no issue with the sp_nonblocking_read() implementation by the way, because in that case, we use SetCommTimeouts() to set values which make the read return immediately, matching the behaviour of read() with O_NONBLOCK on Unix.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa363190(v=vs.85).aspx

A value of MAXDWORD, combined with zero values for both the ReadTotalTimeoutConstant and ReadTotalTimeoutMultiplier members, specifies that the read operation is to return immediately with the bytes that have already been received, even if no bytes have been received.

We still have to pass an OVERLAPPED structure for the read, because that's a requirement when the handle has been opened with FILE_FLAG_OVERLAPPED:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa365467(v=vs.85).aspx

If hFile is opened with FILE_FLAG_OVERLAPPED, the lpOverlapped parameter must point to a valid and unique OVERLAPPED structure

The operation completes immediately though, because of timeout settings, so it's not still running when sp_nonblocking_read() returns. This is guaranteed by the GetOverlappedResult() call right after the ReadFile() - note that an error would be thrown if this returned zero (operation not complete), but that cannot happen because of the timeout setting.

martinling commented 7 years ago

By the way, I agree that having sp_nonblocking_write() always only accept 1 byte isn't great, and I'd be interested to know which serial device / driver and which version of Windows you're seeing that on. I've not heard of that behaviour before.

But any changes we make to fix it have to fit within the documented API, and behave properly across all devices on all versions of Windows.

Accepting only one byte at a time isn't efficient, but it's within the documented behaviour of the function:

The number of bytes returned may be any number from zero to the maximum that was requested.

Of course, the byte-by-byte loop inside the function isn't efficient either, but the only alternatives I can see would involve libserialport maintaining its own write buffer. The size of that buffer would be an arbitrary number, and it would never be "right" for all use cases; what we really want is just to return when the kernel's buffer is full, which is what the byte-by-byte loop tries to achieve.

tomay3000 commented 7 years ago

Hello, yes, libserialport maintaining its own write buffer would be a good alternative, also its read buffer is necessary, because I have a theory and I have to do more tests.

I think that the .net SerialPort class of Microsoft uses its own read and write buffers set to 4096 and 2048 respectively and its API calls SetupComm with these values. see: https://msdn.microsoft.com/en-us/library/system.io.ports.serialport.readbuffersize(v=vs.110).aspx and https://msdn.microsoft.com/en-us/library/system.io.ports.serialport.writebuffersize(v=vs.110).aspx

the libserialport API could get the actual size of the read and write buffers and then allocates the read and write buffers on the heap in sp_open and frees' em in sp_close.

my theory concerns the read: consider that you have 1 million (1000000) bytes waiting to be read and then you called sp_nonblocking_read, after this call, the read buffer that has to be passed to ReadFile can easily be altered while the read operation is completing asinchronously, then we are in a similar case as the sp_nonblocking_write, because in this case ReadFile will return FALSE and GetLastError will return ERROR_IO_PENDING because 1 million bytes is large enough to make the read operation completes asynchronously in the background thread. similarly sp_blocking_write's write buffer can also be alterable in another thread that can be created by the user.

By the way I am running Windows XP SP3 and I am implementing libserialport for "AT COMMAND" communications like HyperTerminal and Putty.

tomay3000 commented 7 years ago

Hello, I did some tests an came up with the followings: I have tested sp_nonblocking_read's ReadFile. it always returns immediately even if there are 1 million bytes waiting to be read (in my case it always reads 46510 bytes and returns immediately).

I have dug into the Qt QSerialPort class source code and found that they are using Read and Write internal buffers (for the windows version) for this purpose:

QByteArray readChunkBuffer;
QByteArray writeChunkBuffer;

The solution would then involve defining read and write buffers:

void *ReadBuffer = NULL;
void *WriteBuffer = NULL;

allocating them with the same size of the buffer given in the parameters of the sp[non]blocking(read|write) before every call to ReadFile and WriteFile:

realloc(ReadBuffer, sizeof(buff));
realloc(WriteBuffer, sizeof(buff));

Finally these buffers should be freed within sp_close:

free(ReadBuffer);
free(WriteBuffer);

What do you think!

martinling commented 7 years ago

Hi Tomay,

What is the point of allocating a new read buffer? When sp_[non]blocking_read is called, the caller provides their own buffer.

I'm not sure if you've fully understood how the API is supposed to work. Can you explain why you needed to use a nonblocking write for your program?

tomay3000 commented 7 years ago

Hi Martin Ling, Actually the read buffer is not necessary, just forget about it.

my problem is when I tried to use your sp_nonblocking_write version to write, lets say: "AT+CIMI\r", it always writes only just "A" (one byte) and returns, the problem with it is in its loop:

while (written < count) {
        /* Copy first byte of user buffer. */
        port->pending_byte = *ptr++;

        /* Start asynchronous write. */
        if (WriteFile(port->hdl, &port->pending_byte, 1, NULL, &port->write_ovl) == 0) {
            if (GetLastError() == ERROR_IO_PENDING) {
                if (HasOverlappedIoCompleted(&port->write_ovl)) {
                    DEBUG("Asynchronous write completed immediately");
                    port->writing = 0;
                    written++;
                    continue;
                } else {
                    DEBUG("Asynchronous write running");
                    port->writing = 1;
                    RETURN_INT(++written);
                }
            } else {
                /* Actual failure of some kind. */
                RETURN_FAIL("WriteFile() failed");
            }
        } else {
            DEBUG("Single byte written immediately");
            written++;
        }
    }

the loop always does only one iteration and returns because HasOverlappedIoCompleted returns FALSE in the first iteration, which leads to the function to return with one byte written in each time.

I have debugged sp_nonblocking_write to diagnose what was the problem, and found that if I step in this line: if (HasOverlappedIoCompleted(&port->write_ovl)) { then wait lets say for 1 second, and then continue debugging (meaning giving time to the write to complete asynchronously), then in this case HasOverlappedIoCompleted returns TRUE allowing the loop to continue writing the next byte, and so on. But in release builds as said before: HasOverlappedIoCompleted returns FALSE in the first iteration and the function will return writing only the first byte. Because of this problem I have decided to write my implementation, which is working great for my application except for the write buffer we discussed before.

I hope I gave you an idea of what is the problem of sp_nonblocking_write. Thank you for your understanding.

martinling commented 7 years ago

Hi Tomay,

I understand your problem with sp_nonblocking_write. My question was why do you need to use the nonblocking version to send a short AT command? What else is your program doing at the same time?

tomay3000 commented 7 years ago

Hi Martin Ling, At the same time it is just the main thread to do UI drawings, and I don't want the (sp_blocking_write version you are suggesting to be used instead of the sp_nonblocking_write one which I prefer) to block the main GUI thread for at most 1 second. So also using sp_blocking_write in a separate thread will be just a workaround not a final solution. And may be in the future I will need to send like 1 million bytes in the main thread, in this case I will have to use sp_nonblocking_write.

another solution is rewrite sp_nonblocking_write like sp_blocking_write and setting WriteTotalTimeoutConstant = 1 (1 millisecond to return immediately) and GetOverlappedResult with bWait = FALSE as I did except for WriteTotalTimeoutConstant

martinling commented 7 years ago

Hi Tomay,

OK, using nonblocking write makes sense if you are running a GUI event loop in the same thread. I just wanted to check, because often people use the nonblocking version when they don't really need it.

I think the solution to this is going to have to involve libserialport keeping its own write buffer for nonblocking writes. The question is how big to make that buffer.

It needs to be large enough to be efficient when dealing with large data and fast ports. It shouldn't be too large, because that could lead to weird behaviour (e.g. sending a block of data on a slow port, it reports complete but then spends many seconds actually sending the data out of the port).

So the best solution may be a buffer size that depends on the current baud rate of the port.

I don't want to be doing a malloc or realloc on every write operation, because that can cause its own problems:

So how about this:

When the port is opened, sp_open checks the highest baud rate supported by the port. and allocates a write buffer big enough to hold 50ms of data at the highest baud rate supported by the port.
When sp_nonblocking_write is called, it looks at the current baud rate, and accepts up to 50ms of data at that baud rate. It copies the data from the caller's buffer to the libserialport buffer before starting the asynchronous write.

tomay3000 commented 7 years ago

Hi Martin Ling, another solution is as follows: WriteFileEx with a completion routine would be used instead of the WriteFile funtion, and the purpose of the completion routine is to call free(WriteBuffer); https://msdn.microsoft.com/en-us/library/windows/desktop/aa365748(v=vs.85).aspx and the call to malloc would be safe. the reallocation will be managed by the system on the heap depending the available memory.

EDIT: WriteFileEx is very complicated and won't work.

martinling commented 7 years ago

Hi Tomay,

WriteFileEx is not that complicated but it relies on the calling thread getting into an alertable wait state to call the completion routine. From a library routine we can't guarantee when that will happen, which makes things awkward.

In any case, I just don't want to be using malloc where it can be avoided. Code that's using sp_nonblocking_write will be calling it frequently, and allocating & freeing lots of small blocks is often bad for performance. It also makes it hard for any user of the library to write code that will gracefully handle running out of memory.

I had a look at implementing the solution I suggested above. One issue is that there doesn't seem to be a way to reliably enumerate the baudrates supported by a port. What we can do instead though is as follows:

When sp_open is called, read the current baudrate, and allocate a write buffer of an appropriate length.
When sp_set_baudrate or sp_set_config is called to change the baudrate, block until any previous overlapped write has completed, and then realloc the buffer with a new appropriate length.

That should achieve the same result.

tomay3000 commented 7 years ago

Hi Martin Ling, I came up with a final solution, I already Implemented it and it is working like a charm. according to the documentation of sp_nonblocking_write:

The number of bytes returned may be any number from zero to the maximum that was requested.

So, there is no need to define and use a write buffer at all for the operation, because this is a NON-BLOCKING Write, (not ASYNCHRONOUS Write). Asynchronous Write is not implemented at all in libserialport. So, the trick for the solution is to make sp_nonblocking_write blocks for 1 millisecond (WriteTotalTimeoutConstant = 1) and make GetOverlappedResult block for that second (bWait = TRUE), at this point the write operation should have been completed, and the buffed should be free to go out of scope. voila! problem solved.

I also made many changes to the rest of all read and write operations [ONLY WINDOWS VERSION] (code trimming, some bug fixes and maximum read and write performance rewrite).

Take a look at my fork to see what the changes are.

Thank you ;)

martinling commented 7 years ago

Making sp_nonblocking_write block for 1ms is not a solution. The whole point of a nonblocking routine is that it must not block at all.

If blocking for 1ms is acceptable for your program, then you can just use sp_blocking_write with a timeout of 1ms. That would have exactly the same effect as your 1ms implementation of sp_nonblocking_write.

Please don't submit pull requests with multiple unrelated changes in one big commit. If you're going to suggest changes to libserialport, they need to be broken down into one commit per change, each with a commit message that explains the reason for the change.

It's hard to review this properly with all the changes bundled together but even just at a glance I can see lots of problems with your changes:

Setting the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags disables the kernel's buffering for the port. Why? This can only decrease performance. There's no need for this.
The changes to sp_blocking_write don't make sense: you're calling GetOverlappedResult without knowing whether there was actually a pending operation using that OVERLAPPED structure, and in fact there can't be, since you've made sp_nonblocking_write block until its overlapped write is complete.
You've made sp_nonblocking_read block for 1ms as well, which is just pure timewasting since the Windows API already supports an actual nonblocking read.
You've replaced the loop in sp_blocking_read_next with a single ReadFile call, creating a bug. (Hint: the bug will take about 49 days to occur.)

I'm going to implement the changes I proposed so that sp_nonblocking_write will accept a reasonable number of bytes at a time. All of the rest of this seems wrong or unnecessary.

tomay3000 commented 7 years ago

Hi Martin Ling, OK, just make the changes you proposed, so others like me won't get the same problem as I. I am not going to submit any pull requests at all (may be because I have used the -u switch of git), I have made the changes on my own fork for more tests.

Q: where did I came up with these changes !? A: I have spent a lot of time debugging, reversing, and testing, all used Windows APIs.

--- any assembly instructions of the kernel's functions are actually taking time to execute (in microseconds) - I know it is a lot less than 1 millisecond -. But If you try to calculate the time elapsed for the sp_nonblocking_read function (which the system will take care of its non-blocking mechanism), you will find that there is actually time elapsed behind the scenes. this time is very close the 1ms timeout of the sp_nonblocking_write I used. --- I have rewritten sp_nonblocking_read just to reflect the new mechanism of the sp_nonblocking_write.

From the MSDN documentation: https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx

A write-through request via FILE_FLAG_WRITE_THROUGH also causes NTFS to flush any metadata changes, such as a time stamp update or a rename operation, that result from processing the request. For this reason, the FILE_FLAG_WRITE_THROUGH flag is often used with the FILE_FLAG_NO_BUFFERING flag as a replacement for calling the FlushFileBuffers function after each write, which can cause unnecessary performance penalties. Using these flags together avoids those penalties. For general information about the caching of files and metadata, see File Caching. When FILE_FLAG_NO_BUFFERING is combined with FILE_FLAG_OVERLAPPED, the flags give maximum asynchronous performance, because the I/O does not rely on the synchronous operations of the memory manager. However, some I/O operations take more time, because data is not being held in the cache. Also, the file metadata may still be cached (for example, when creating an empty file). To ensure that the metadata is flushed to disk, use the FlushFileBuffers function.

The flags FILE_FLAG_WRITE_THROUGH and FILE_FLAG_NO_BUFFERING are independent and may be combined.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa365467(v=vs.85).aspx

If hFile was opened with FILE_FLAG_OVERLAPPED, the following conditions are in effect: The lpOverlapped parameter must point to a valid and unique OVERLAPPED structure, otherwise the function can incorrectly report that the read operation is complete. The lpNumberOfBytesRead parameter should be set to NULL. Use the GetOverlappedResult function to get the actual number of bytes read. If the hFile parameter is associated with an I/O completion port, you can also get the number of bytes read by calling the GetQueuedCompletionStatus function.

So, here GetOverlappedResult function is used to get the actual number of bytes read / written. it is just a code reformat. Just read the comments to understand:

/* Do read. */
    if (ReadFile(port->hdl, buf, count, NULL, &port->read_ovl) == 0) // Control is made first.
        if (GetLastError() != ERROR_IO_PENDING) // ReadFile failed with a serious error other than ERROR_IO_PENDING
            RETURN_FAIL("ReadFile() failed"); // So, no need to continue, and just return
    // Here ReadFile may have returned TRUE or FALSE. and if it has returned FALSE, so
    // GetLastError() is certainly ERROR_IO_PENDING.
    // In either case, GetOverlappedResult will return the number of bytes transferred.
    DEBUG("Waiting for read to complete");
    if (GetOverlappedResult(port->hdl, &port->read_ovl, &bytes_read, TRUE) == 0)
        RETURN_FAIL("GetOverlappedResult() failed");
    // Reversing GetOverlappedResult, it calls WaitForSingleObject and stores its return value in
    // read_ovl.Internal
    if (port->read_ovl.Internal == WAIT_TIMEOUT)
        DEBUG("Read timed out");

For sp_blocking_read_next https://msdn.microsoft.com/en-us/library/windows/desktop/aa363190(v=vs.85).aspx

Remarks If an application sets ReadIntervalTimeout and ReadTotalTimeoutMultiplier to MAXDWORD and sets ReadTotalTimeoutConstant to a value greater than zero and less than MAXDWORD, one of the following occurs when the ReadFile function is called: If there are any bytes in the input buffer, ReadFile returns immediately with the bytes in the buffer. If there are no bytes in the input buffer, ReadFile waits until a byte arrives and then returns immediately. If no bytes arrive within the time specified by ReadTotalTimeoutConstant, ReadFile times out.

the system will handle it for you, so the while loop is a waste and not necessary at all.

Finally: I agree that I should have written sp_nonblocking_read and sp_nonblocking_write like follows:

SP_API enum sp_return sp_nonblocking_read(struct sp_port *port, void *buf, size_t count)
{
#ifdef _WIN32
    return sp_blocking_read_next(port, buf, 1);
#endif
...
}

SP_API enum sp_return sp_nonblocking_write(struct sp_port *port, const void *buf, size_t count)
{
#ifdef _WIN32
    return sp_blocking_write(port, buf, 1);
#endif
...
}

Thank you for your understanding.