Open koehil opened 3 years ago
Another strange threading problem I can't reproduce locally…
From what I've understood, creating multiple streaminfo objects in a row fails, so if I were to reduce the example even further, something like Enumerable.Range(0, 10).Select(i => new liblsl.StreamInfo($"stream{i}", "test")).Where(i => i.handle() != IntPtr.Zero).Count();
should return 1?
No, it would return 2....
The main application is 15 years in development, so we are using conservative C# syntax here, but translated to (referring to the above code):
for (int i = 0; i < 10; i++) { DoConnect(); }
I get 2 outlets and 8 fails....
If it were just 1 outlet.... but 2 succeeding times the StreamInfo returns a handle, the next 8 iterations it does not.
I don't know if this is relevant, but the thread from which the above is called, also calls many other things in the software and there are also thread synchronisation calls on the execution path (ManualResetEvent). Still, it is very strange that 2 out of 10 calls work in the iteration above.
To have more fun with threading issues, I call in the above construct:
for (int i = 0; i < 10; i++) { System.Threading.ThreadPool.QueueUserWorkItem(DoConnect); }
Now, the StreamInfos are constructed from the ThreadPool. Surprisingly, I often get 4 outlets, 5 fails and then (several times, but not always) the application crashes without any .NET trace. This means, unmanaged memory has become corrupted, and Windows then cicks out the app. But I also saw 5 outlets and 5 fails without crash.
using System;
using System.Linq;
using System.Threading;
using LSL;
namespace ConsoleApplication1
{
class Program
{
static void DoConnect(object state)
{
var info = new liblsl.StreamInfo("stream", "test");
if (info.handle() != IntPtr.Zero)
Console.Out.WriteLine(new liblsl.StreamOutlet(info).ToString());
else Console.Out.WriteLine("Uh-oh");
Thread.Sleep(50);
}
static void Main(string[] args)
{
Console.Out.WriteLine(liblsl.library_version());
liblsl.StreamInfo info = new liblsl.StreamInfo("MetaTester","EEG",8,100,liblsl.channel_format_t.cf_float32,"myuid323457");
Console.Out.WriteLine(Enumerable.Range(0, 10).Select(i => new liblsl.StreamInfo($"stream{i}", "test")).Where(i => i.handle() != IntPtr.Zero).Count());
for (int i = 0; i < 10; i++) { System.Threading.ThreadPool.QueueUserWorkItem(DoConnect); }
Thread.Sleep(5000);
}
}
}
On my PC, this prints 114 10
and the default startup log with 10x LSL.liblsl+StreamOutlet
interspersed. I've uploaded the win64 DLL with debug information to the Github releases.
Thanks for your quick responses!
In a console application I can't break the code. Not with background threads, nor with ThreadPool theads. In our Windows application however, I cannot get the same code to work. And that is consistent on multiple computers.
The Debug dll would be great, but I guess it has some dependencies that I don't have, so that it won't load on my system?
Even with [STAThread] static void Main(string[] args)
and <Outputtype>WinExe</Outputtype>
I get 10 happy streaminfos (.Net Core 3.1).
The debug DLLs shouldn't require anything (at most the MSVC runtime with debug information, but I think not even that), but I think the threading issues are no match for additional debug symbols.
I've got it working! But with a very unexpected modification...
It appears, that the LSLStreamInfo fails to instantiate, because the channel format specifier is not correct. I have been testing on an own build of the 1.13 library, and changed stream_info_impl
ctor like this:
stream_info_impl::stream_info_impl(const string &name, const string &type, int channel_count, double nominal_srate, lsl_channel_format_t channel_format, const string &source_id):
name_(name), type_(type), channel_count_(channel_count), nominal_srate_(nominal_srate), channel_format_(channel_format), source_id_(source_id), version_(api_config::get_instance()->use_protocol_version()),
v4data_port_(0), v4service_port_(0), v6data_port_(0), v6service_port_(0), created_at_(0) {
/* these checks do what they need to do - just ignore for now
if (name.empty())
throw std::invalid_argument("The name of a stream must be non-empty.");
if (channel_count < 0)
throw std::invalid_argument("The channel_count of a stream must be nonnegative.");
if (nominal_srate < 0)
throw std::invalid_argument("The nominal sampling rate of a stream must be nonnegative.");
*/
if (channel_format < 0 || channel_format > 7)
{
//throw std::invalid_argument("The stream info was created with an unknown channel format.");
channel_format_ = lsl_channel_format_t::cft_double64;
}
// initialize XML document
write_xml(doc_);
}
Now, the strange thing is: because the same constructor is called 10 times in a row with unchanged parameters (like our tests above), it is not expected that the format specifier changes. It is just an integer specified with a constant (enum). Still, in our special case, the specifier is accepted 2 times and not accepted 8 times in the iteration.
If I remove that check completely, I get a memory violation exception in .NET. Apparantly (but I was not yet able to inspect what value) the channel format specifier becomes a weird value after a few calls.
Things I have been looking at before I found the above, but could not say anything conclusive about, are:
api_config
class; but I lack the knowledge to predict what happens with such a singleton object in unmanaged memory, if the pointer of an unmanged object (stream_info_impl
) is only kept in managed memory.So, I seem to have fixed my problem, but still don't know what is going wrong....
sorry, wasn't meant to close...
Hi, I think I have found the real problem here. There is a parameter type mismatch in the Windows API.
It is the lsl_channel_format_t
-enum type that is used as a parameter in e.g. lsl_create_streaminfo(...)
that causes a byte length uncertainty and it looks like this causes calling mismatches that may corrupt the stack.
lsl_create_streaminfo(...)
is declared like this in your library:
extern LIBLSL_C_API lsl_streaminfo lsl_create_streaminfo(const char *name, const char *type, int32_t channel_count, double nominal_srate, lsl_channel_format_t channel_format, const char *source_id);
where LIBLSL_C_API
in fact is: __declspec(dllimport)
on Windows.
The lsl_channel_format_t
-enum is defined without any decoration telling the size of the values. In that case I would assume values to be of type int
(32-bit signed integer). However, in the C# API (LSL.cs), the enum representing its sister in C++ does have a decoration and specifies the value to be of type byte
(8-bit unsigned integer). Oops... that looks very dangerous.
I have tried to change the type of the C# version of the enum to int
(32-bit signed integer), but this still causes corrupted API calls between the .NET-runtime and the LSL library.
What did work, was changing the type for the channel_format
parameter in the LSL-library to a clear, basic value type, and I arbitrarily choose unsigned char
(called byte in our dayly lifes). So the API declaration becomes:
extern LIBLSL_C_API lsl_streaminfo lsl_create_streaminfo(const char *name, const char *type, int32_t channel_count, double nominal_srate, unsigned char channel_format, const char *source_id);
Now, I didn't have time and means to dive deeper into what exactly went wrong, and why it runs okay in many occasions, but I would have two suggestions, since I have seen API definition problems before in my programming career:
int
and Int32
in some context mean the same (like in C#), then choose Int32
, because that is the most unambiguous. Better not use an enum-type, because the type size is less obvious.__stdcall
rather than __cdecl
, but I couldn't find a reference confirming this. Note that they are fundamentally different when it comes to handling the stack, so caller and callee must use the same method.Well.... in fact I have just requested to change the API of liblsl.... I guess that doesn't produce many happy faces.... Still, I hope you can pick this up somehow...
All the best
Wow, that's both a bad and incredible bug hunting skills. I'll check what size the channel format parameters was supposed to be in the previous releases and either declare the size of the enum explicitly (for C++11 and later) or revert it to a sized int (C).
FWIW the C standard requires an enum to be big enough to hold the biggest value, but other than that the compiler can do what it wants. My compiler (clang) and as far as I know MSVC make it 4 bytes long, so if previous binaries also assume this I'd just change the parameter to an uint32_t
.
Or keep the definition and add a new one for seemless migration...
Thanks for picking this up so quickly.
I did find another, very old issue, that may be related, although it was about Python. I just mention it here; maybe it is also totally unrelated - in that case please ignore... https://github.com/sccn/lsl_archived/issues/142
Many days I have been looking at a strange problem, of which I suspect it has to do something with threading. The setup is this: lsllib 1.13 64 bit, Windows 10, source code in C# on .NET3.5. with Visual Studio 2008 In our Windows Forms application, an LSL outlet is to be created on a background thread with Single Threaded Apartment (STA). We use this method (simplified for the purpose of this description):
The first call to DoConnect() for the lifetime of the application runs okay. On some computers, the second call fails. The third time always fails. What happens, is that the LSL StreamInfo constructor returns without handle. In the LSL framework, an exception must have occurred, so that no object handle is returned. Strange enough, looking at the LSL source code, the StreamInfo ctor parameters should not lead to an exception.
But why is the object handle null? Why does it work once or twice, and after that no StreamInfo can be created anymore...?
I have tried a number of things. Firstly, I have put above code and references in a simple console app. From the main thread, but also from a background thread I try to create multiple outlets. No problem. That works!
Then I have tried to call 'DoConnect()' in the above code in the intended application from a ThreadPool thread. Now something strange happens: if I try to call DoConnect() from 4 ThreadPool threads that are called shortly after eachother (I mean within microseconds) I get 4 outlets! If I try to add another some time later in the same application instance, the call fails.
In the debugger, something unexpected happens as well: if I put a breakpoint directly after TEST MARK 1, and from there I step to the next line (after TEST MARK 2), I always have a handle in the StreamInfo (so outlet creation succeeds). However, if I break on TEST MARK 1 and then from there I continue normal program execution, often no handle is returned.
I assume there is some threading problem, but I cannot find any hints to what is faul there.
Maybe one of you guys can help! Keep up with the good work!