sccn / liblsl

C++ lsl library for multi-modal time-synched data transmission over the local network
Other
108 stars 63 forks source link

StreamInfo ctor does not always return a handle on Win 64 with C# #83

Open koehil opened 3 years ago

koehil commented 3 years ago

Many days I have been looking at a strange problem, of which I suspect it has to do something with threading. The setup is this: lsllib 1.13 64 bit, Windows 10, source code in C# on .NET3.5. with Visual Studio 2008 In our Windows Forms application, an LSL outlet is to be created on a background thread with Single Threaded Apartment (STA). We use this method (simplified for the purpose of this description):

private void DoConnect()
{
    // we keep a global reference to the outlet object
    if (mOutlet != null)
    {
        // we have added the IDisposable interface to the C# API
        mOutlet.Dispose();
        mOutlet = null;
    }

    // TEST MARK 1

    // Note: we have renamed StreamInfo to LSLStreamInfo in the LSL API
    liblsl.LSLStreamInfo outStreamInfo = new liblsl.LSLStreamInfo(
        mUniqueStreamName,
        mStreamType,
        mChannelCount,
        mSampleRate,
        liblsl.channel_format_t.cf_double64,
        mSourceID);

    // TEST MARK 2

    // Sometimes creating the outlet fails. It appeared, that 'outStreamInfo' did
    // not have a valid Handle. Therefore test if the object handle is valid.
    if (outStreamInfo.Handle != IntPtr.Zero)
    {
        // internal helper to construct the XML header
        LSLHelpers.SignalInfoToXMLElement(outStreamInfo.desc(), mMyChannels);
        Trace.WriteLine("LSL: create new outlet:\n" + outStreamInfo.as_xml());
        mOutlet = new liblsl.StreamOutlet(outStreamInfo);

        // Success!             
    }
    else
    {
        Trace.WriteLine("LSL Outlet: PROBLEM: Handle StreamInfo null");

        // Fail... :-(
    }
}

The first call to DoConnect() for the lifetime of the application runs okay. On some computers, the second call fails. The third time always fails. What happens, is that the LSL StreamInfo constructor returns without handle. In the LSL framework, an exception must have occurred, so that no object handle is returned. Strange enough, looking at the LSL source code, the StreamInfo ctor parameters should not lead to an exception.

But why is the object handle null? Why does it work once or twice, and after that no StreamInfo can be created anymore...?

I have tried a number of things. Firstly, I have put above code and references in a simple console app. From the main thread, but also from a background thread I try to create multiple outlets. No problem. That works!

Then I have tried to call 'DoConnect()' in the above code in the intended application from a ThreadPool thread. Now something strange happens: if I try to call DoConnect() from 4 ThreadPool threads that are called shortly after eachother (I mean within microseconds) I get 4 outlets! If I try to add another some time later in the same application instance, the call fails.

In the debugger, something unexpected happens as well: if I put a breakpoint directly after TEST MARK 1, and from there I step to the next line (after TEST MARK 2), I always have a handle in the StreamInfo (so outlet creation succeeds). However, if I break on TEST MARK 1 and then from there I continue normal program execution, often no handle is returned.

I assume there is some threading problem, but I cannot find any hints to what is faul there.

Maybe one of you guys can help! Keep up with the good work!

tstenner commented 3 years ago

Another strange threading problem I can't reproduce locally…

From what I've understood, creating multiple streaminfo objects in a row fails, so if I were to reduce the example even further, something like Enumerable.Range(0, 10).Select(i => new liblsl.StreamInfo($"stream{i}", "test")).Where(i => i.handle() != IntPtr.Zero).Count(); should return 1?

koehil commented 3 years ago

No, it would return 2....

The main application is 15 years in development, so we are using conservative C# syntax here, but translated to (referring to the above code): for (int i = 0; i < 10; i++) { DoConnect(); } I get 2 outlets and 8 fails....

If it were just 1 outlet.... but 2 succeeding times the StreamInfo returns a handle, the next 8 iterations it does not.

I don't know if this is relevant, but the thread from which the above is called, also calls many other things in the software and there are also thread synchronisation calls on the execution path (ManualResetEvent). Still, it is very strange that 2 out of 10 calls work in the iteration above.

koehil commented 3 years ago

To have more fun with threading issues, I call in the above construct:

for (int i = 0; i < 10; i++) { System.Threading.ThreadPool.QueueUserWorkItem(DoConnect); }

Now, the StreamInfos are constructed from the ThreadPool. Surprisingly, I often get 4 outlets, 5 fails and then (several times, but not always) the application crashes without any .NET trace. This means, unmanaged memory has become corrupted, and Windows then cicks out the app. But I also saw 5 outlets and 5 fails without crash.

tstenner commented 3 years ago
using System;
using System.Linq;
using System.Threading;
using LSL;

namespace ConsoleApplication1
{
    class Program
    {
        static void DoConnect(object state)
        {
            var info = new liblsl.StreamInfo("stream", "test");
            if (info.handle() != IntPtr.Zero)
                Console.Out.WriteLine(new liblsl.StreamOutlet(info).ToString());
            else Console.Out.WriteLine("Uh-oh");
            Thread.Sleep(50);
        }

        static void Main(string[] args)
        {
            Console.Out.WriteLine(liblsl.library_version());
            liblsl.StreamInfo info = new liblsl.StreamInfo("MetaTester","EEG",8,100,liblsl.channel_format_t.cf_float32,"myuid323457");
            Console.Out.WriteLine(Enumerable.Range(0, 10).Select(i => new liblsl.StreamInfo($"stream{i}", "test")).Where(i => i.handle() != IntPtr.Zero).Count());

            for (int i = 0; i < 10; i++) { System.Threading.ThreadPool.QueueUserWorkItem(DoConnect); }
            Thread.Sleep(5000);
        }
    }
}

On my PC, this prints 114 10 and the default startup log with 10x LSL.liblsl+StreamOutlet interspersed. I've uploaded the win64 DLL with debug information to the Github releases.

koehil commented 3 years ago

Thanks for your quick responses!

In a console application I can't break the code. Not with background threads, nor with ThreadPool theads. In our Windows application however, I cannot get the same code to work. And that is consistent on multiple computers.

The Debug dll would be great, but I guess it has some dependencies that I don't have, so that it won't load on my system?

tstenner commented 3 years ago

Even with [STAThread] static void Main(string[] args) and <Outputtype>WinExe</Outputtype> I get 10 happy streaminfos (.Net Core 3.1).

The debug DLLs shouldn't require anything (at most the MSVC runtime with debug information, but I think not even that), but I think the threading issues are no match for additional debug symbols.

koehil commented 3 years ago

I've got it working! But with a very unexpected modification...

It appears, that the LSLStreamInfo fails to instantiate, because the channel format specifier is not correct. I have been testing on an own build of the 1.13 library, and changed stream_info_impl ctor like this:

stream_info_impl::stream_info_impl(const string &name, const string &type, int channel_count, double nominal_srate, lsl_channel_format_t channel_format, const string &source_id):
    name_(name), type_(type), channel_count_(channel_count), nominal_srate_(nominal_srate), channel_format_(channel_format), source_id_(source_id), version_(api_config::get_instance()->use_protocol_version()),
    v4data_port_(0), v4service_port_(0), v6data_port_(0), v6service_port_(0), created_at_(0) {
    /* these checks do what they need to do - just ignore for now
    if (name.empty())
        throw std::invalid_argument("The name of a stream must be non-empty.");
    if (channel_count < 0)
        throw std::invalid_argument("The channel_count of a stream must be nonnegative.");
    if (nominal_srate < 0)
        throw std::invalid_argument("The nominal sampling rate of a stream must be nonnegative.");
    */
    if (channel_format < 0 || channel_format > 7)
    {
        //throw std::invalid_argument("The stream info was created with an unknown channel format.");
        channel_format_ = lsl_channel_format_t::cft_double64;
    }

    // initialize XML document
    write_xml(doc_);
}

Now, the strange thing is: because the same constructor is called 10 times in a row with unchanged parameters (like our tests above), it is not expected that the format specifier changes. It is just an integer specified with a constant (enum). Still, in our special case, the specifier is accepted 2 times and not accepted 8 times in the iteration.

If I remove that check completely, I get a memory violation exception in .NET. Apparantly (but I was not yet able to inspect what value) the channel format specifier becomes a weird value after a few calls.

Things I have been looking at before I found the above, but could not say anything conclusive about, are:

So, I seem to have fixed my problem, but still don't know what is going wrong....

koehil commented 3 years ago

sorry, wasn't meant to close...

koehil commented 3 years ago

Hi, I think I have found the real problem here. There is a parameter type mismatch in the Windows API.

It is the lsl_channel_format_t-enum type that is used as a parameter in e.g. lsl_create_streaminfo(...) that causes a byte length uncertainty and it looks like this causes calling mismatches that may corrupt the stack.

lsl_create_streaminfo(...) is declared like this in your library:

extern LIBLSL_C_API lsl_streaminfo lsl_create_streaminfo(const char *name, const char *type, int32_t channel_count, double nominal_srate, lsl_channel_format_t channel_format, const char *source_id);

where LIBLSL_C_API in fact is: __declspec(dllimport) on Windows.

The lsl_channel_format_t-enum is defined without any decoration telling the size of the values. In that case I would assume values to be of type int (32-bit signed integer). However, in the C# API (LSL.cs), the enum representing its sister in C++ does have a decoration and specifies the value to be of type byte (8-bit unsigned integer). Oops... that looks very dangerous.

I have tried to change the type of the C# version of the enum to int (32-bit signed integer), but this still causes corrupted API calls between the .NET-runtime and the LSL library.

What did work, was changing the type for the channel_format parameter in the LSL-library to a clear, basic value type, and I arbitrarily choose unsigned char (called byte in our dayly lifes). So the API declaration becomes:

extern LIBLSL_C_API lsl_streaminfo lsl_create_streaminfo(const char *name, const char *type, int32_t channel_count, double nominal_srate, unsigned char channel_format, const char *source_id); Now, I didn't have time and means to dive deeper into what exactly went wrong, and why it runs okay in many occasions, but I would have two suggestions, since I have seen API definition problems before in my programming career:

Well.... in fact I have just requested to change the API of liblsl.... I guess that doesn't produce many happy faces.... Still, I hope you can pick this up somehow...

All the best

tstenner commented 3 years ago

Wow, that's both a bad and incredible bug hunting skills. I'll check what size the channel format parameters was supposed to be in the previous releases and either declare the size of the enum explicitly (for C++11 and later) or revert it to a sized int (C).

tstenner commented 3 years ago

FWIW the C standard requires an enum to be big enough to hold the biggest value, but other than that the compiler can do what it wants. My compiler (clang) and as far as I know MSVC make it 4 bytes long, so if previous binaries also assume this I'd just change the parameter to an uint32_t.

koehil commented 3 years ago

Or keep the definition and add a new one for seemless migration...

koehil commented 3 years ago

Thanks for picking this up so quickly.

I did find another, very old issue, that may be related, although it was about Python. I just mention it here; maybe it is also totally unrelated - in that case please ignore... https://github.com/sccn/lsl_archived/issues/142