openframeworks / openFrameworks

openFrameworks is a community-developed cross platform toolkit for creative coding in C++.
http://openframeworks.cc
Other
9.96k stars 2.55k forks source link

ofxTCPManager broken on Windows #4556

Closed traumedy closed 8 years ago

traumedy commented 8 years ago

ofxTCPManager::Receive() calls recv() which returns -1, WSAGetLastError() returns WSAEWOULDBLOCK which is correct because the socket is non-blocking, but it is still returned as an error so for instance when called from ofxTCPClient::receiveRawBytes():

    messageSize = TCPClient.Receive(receiveBuffer, numBytes);
    //  0 is not an error... -1 is
    if ( messageSize == SOCKET_ERROR ){
        close();

There are related issues from other places it is called, for instance ofxTCPClient::receiveRaw():

    messageSize = TCPClient.Receive(tmpBuff, TCP_MAX_MSG_SIZE);
    if(messageSize==0){
        close();
    }else if(messageSize<TCP_MAX_MSG_SIZE) {
        // null terminate!!
        tmpBuff[messageSize] = 0;

tmpBuff[-1] would be overwritten in this case.

Also by the time ofxNetworkCheckError() is entered WSAGetLastError() returns 0, I think the error value has been reset by some successful call to a Windows API in between (I read in some places that WSAGetLastError() just wraps GetLastError()).

ofTheo commented 8 years ago

thanks for the bug report @traumedy ! we're aiming to do a 0.9.1 release soon to address small bugs/regressions in 0.9.0

Could you submit a PR with fixes for the issues you mentioned? It would be great to double check any changes before you submit it.

thanks!!

traumedy commented 8 years ago

I'm guessing this is broken for Linux and OSX too from the looks of it.

I can do the leg work, but it might be more than a simple fix or two. Some minor design stuff might should be rearranged regarding error handling.

On Tuesday, November 10, 2015, Theodore Watson notifications@github.com wrote:

thanks for the bug report @traumedy https://github.com/traumedy ! we're aiming to do a 0.9.1 release soon to address small bugs/regressions in 0.9.0

Could you submit a PR with fixes for the issues you mentioned? It would be great to double check any changes before you submit it.

thanks!!

— Reply to this email directly or view it on GitHub https://github.com/openframeworks/openFrameworks/issues/4556#issuecomment-155631395 .

JOSH BUCHBINDER | Interactive Developer | +1 925 351 8049 | OBSCURA http://www.obscuradigital.com/ | SF http://www.obscuradigital.com/contact/ +1 415 227 9979 | http://www.obscuradigital.com/contact/ http://www.obscuradigital.com/contact/ http://www.obscuradigital.com/contact/NYC http://www.obscuradigital.com/contact/ | http://www.obscuradigital.com/contact/ http://www.obscuradigital.com/contact/ http://www.obscuradigital.com/contact/ http://www.obscuradigital.com/contact/Stockholm http://www.obscuradigital.com/contact/ | Proprietary & Confidential

ofTheo commented 8 years ago

great thanks! @traumedy

If you end up making a minimal test example that replicates it on windows, I can try and replicate on OS X.

traumedy commented 8 years ago

No offense to whoever coded this, it at least worked in 0.8, not sure who revamped it but they couldn't have tested it and thought it worked.

Part of the problem with ofxNetworkCheckError() is that it's actually a macro:

#define ofxNetworkCheckError() ofxNetworkCheckErrno(__FILE__,ofToString(__LINE__))

Turns out that WSAGetLastError() actually just maps to GetLastError, so something in ofToString() (didn't dig in) must call some Windows API that resets the error value, the same is probably true for Linux type OS's, any call to any function that sets errno will have cleared the socket error by the time it gets to ofxNetworkCheckErrno().

Anyway, it is unnecessary and inefficient to convert these built in macros to std::string's so changing:

#define ofxNetworkCheckError() ofxNetworkCheckErrno(__FILE__,ofToString(__LINE__))
inline int ofxNetworkCheckErrno(const string & file, const string & line){
to
#define ofxNetworkCheckError() ofxNetworkCheckErrno(__FILE__, __LINE__)
inline int ofxNetworkCheckErrno(const char* file, int line) {

.. seems to fix that issue. Now going through and checking all calls and handling of return values, checking for EWOULDBLOCK etc.

Should have something by Monday I hope. What's your target date for 0.9.1 release?

traumedy commented 8 years ago

Ug this is more involved than I realized. At least in the latest documentation, only ofxUDPManager::SetTimeoutReceive() and ofxUDPManager::SetTimeoutSend() are documented, but the code actually exposes ofxTCPManager::SetTimeoutConnect() ofxTCPManager::SetTimeoutSend() ofxTCPManager::SetTimeoutReceive() ofxTCPManager::SetTimeoutAccept() etc... and some but not all of the functions will return SOCKET_ERROR or SOCKET_TIMEOUT but little of the code ever checks for SOCKET_TIMEOUT and it is unclear what should be done in the case of a timeout as the methods return bool and it might have closed the socket if there was an error or simply returned without sending/receiving some or all of the data, and there is no way for the caller to know which.

As implemented, all of the send* methods in for instance ofxTCPClient return bool, but there is no way of knowing if the operation timed out due to network lag or if the connection was closed. It is inconsistent which methods even close the connection on error etc.

Also the timeouts seem to be in seconds which is a weird level of granularity for networking operations.

Any advice as to how I should proceed? I'm two knuckles deep in this code right now and have a pretty good grasp of what's going on and a good picture in my mind of how it should be. I have made probably more changes than you would like, but I'll try to write some unit testing cross platform oF projects too so we can all feel confident my major changes work on all platforms (assuming other people will test on non-Windows platforms.)

SO:
. What about all these undocumented timeout methods for ofxTCPManager (Set/GetTimeout*())? . How should timeouts vs actual errors be handled, keeping in mind that sends can block too if the network buffers are full? . Are you comfortable with me making these major changes, even given that it doesn't seem to work cross-platform in its current state? . Do you like movies about gladiators?

traumedy commented 8 years ago

There is also this undocumented method:

void ofxTCPManager::CleanUp() {
    #ifdef TARGET_WIN32
        WSACleanup();
    #endif
  m_bWinsockInit = false;
}

I don't see it called from anywhere. And shutting down WinSock could make any other addon or DLL etc that is using WinSock stop working unexpectedly. At the least m_bWinsockInit should only be declared and used if TARGET_WIN32 is defined?

ofTheo commented 8 years ago

@traumedy thanks for looking through this! one of the issues is the TCPManager / UDPManager code is really old - from before the first release of openFrameworks, so its probably the jankiest code to go through ( sorry )!

The undocumented get/set timeout functions look like they were recently added and didn't include documentation. ( https://github.com/openframeworks/openFrameworks/commit/826a99ce3ecdeead68ce9f6e6cff416e4486f85e )

I think internally there is a feeling that we should really clean up ofxNetwork for the next major release or replace its internals with a api that is much more robust.

I think what might be the best approach right now would be to open issues for different problems/bugs found with the addon. Then we can do bitesized PRs for each issue.

If there are any really major changes it would be great if you could describe them.

I think tackling the obvious bugs first and then we can look at the more structural / larger api issues.

Thanks!! Theo

traumedy commented 8 years ago

Ok gotcha, granular commits,mod course. So should I just file an issue for each change and include the PR?

On Sunday, November 15, 2015, Theodore Watson notifications@github.com wrote:

@traumedy https://github.com/traumedy thanks for looking through this! one of the issues is the TCPManager / UDPManager code is really old - from before the first release of openFrameworks, so its probably the jankiest code to go through ( sorry )!

The undocumented get/set timeout functions look like they were recently added and didn't include documentation. ( 826a99c https://github.com/openframeworks/openFrameworks/commit/826a99ce3ecdeead68ce9f6e6cff416e4486f85e )

I think internally there is a feeling that we should really clean up ofxNetwork for the next major release or replace its internals with a api that is much more robust.

I think what might be the best approach right now would be to open issues for different problems/bugs found with the addon. Then we can do bitesized PRs for each issue.

If there are any really major changes it would be great if you could describe them.

I think tackling the obvious bugs first and then we can look at the more structural / larger api issues.

Thanks!! Theo

— Reply to this email directly or view it on GitHub https://github.com/openframeworks/openFrameworks/issues/4556#issuecomment-156893000 .

JOSH BUCHBINDER | Interactive Developer | +1 925 351 8049 | OBSCURA http://www.obscuradigital.com/ | SF http://www.obscuradigital.com/contact/ +1 415 227 9979 | http://www.obscuradigital.com/contact/ http://www.obscuradigital.com/contact/ http://www.obscuradigital.com/contact/NYC http://www.obscuradigital.com/contact/ | http://www.obscuradigital.com/contact/ http://www.obscuradigital.com/contact/ http://www.obscuradigital.com/contact/ http://www.obscuradigital.com/contact/Stockholm http://www.obscuradigital.com/contact/ | Proprietary & Confidential

ofTheo commented 8 years ago

@traumedy - that would be awesome! makes it much easier to review and catch any potential issues!

ofTheo commented 8 years ago

@traumedy - going to close this issue, but if there are other critical fixes needed for our bugfix release 0.9.1, please open new issues and PR against them!

Thanks!! Theo