labstreaminglayer / pylsl

Python bindings (pylsl) for liblsl
MIT License
142 stars 58 forks source link

Support for pushing arbitrary byte sequences in cf_string channels #54

Open papr opened 2 years ago

papr commented 2 years ago

We would like to transmit arbitrary byte sequences using pylsl and cf_string channels. Unfortunately, the current pylsl implementation does not support that. Technically, it should work just fine though.

liblsl states the following about the cft_string channel format:

/** For variable-length ASCII strings or data blobs, such as video frames, complex event
      descriptions, etc. */
cft_string = 3,

For the user's convenience, pylsl cf_string outlets accept Python str objects and encode them on the fly. Similarly, the cf_string inlets decode incoming byte strings to str objects.

While this is helpful 99% of the time, it makes sending/receiving arbitrary byte strings impossible, e.g. when one wants to transmit encoded video frame data. Decoding our bytestring to utf-8 first and then have pylsl encode it again is unfortunately not a viable workaround as not all byte sequences can be decoded to utf-8.

Proposed solution

We suggest adding a boolean flag to the corresponding pull_* and push_* function that indicates wether the data should be encoded/decoded, defaulting to True. This should fit conceptionally, given that it is the users responsibility to interpret the data.

Dear pylsl maintainers, what do you think about this proposal?

/cc @ClaraKuper

cboulay commented 2 years ago

Just thinking out loud...

We could add another pylsl-only channel_format for cf_bytestring. This would require an additional mapping from cf_bytestring to the same lib.pull and lib.push functions as cf_string, but it wouldn't pass the cf_string check to trigger the encode/decode.

How does that sound?

I prefer doing something like this instead of adding a kwarg that is only relevant in << 1% of cases and might be confusing if not documented exactly correctly.

papr commented 2 years ago

@cboulay I think this is a very good alternative. 👍🏻 @clarakuper and I will work on an implementation proposal.

cboulay commented 2 years ago

I heard from @ClaraKuper in Slack that you are now encountering a problem with your char-arrays being interpreted as null-terminated strings, effectively cutting your data whenever a 0 appears. This is indeed inconsistent with the doctoring.

There's another push_ method in lsl's C-API that can handle your use-case. Unfortunately it is not exposed in pylsl, probably because the method requires an additional argument so it would be hard to make the call transparent to the user.

https://github.com/sccn/liblsl/blob/05f7e16525930058c2e00681b10da9031fa7b90e/src/lsl_outlet_c.cpp#L161-L173

Since we are providing the length of the char array, it uses the 'from sequence' std::string constructor (5 here), and it shouldn't do a null termination.

So, as above, if you use a new cf_bytestring type, you can modify the function maps to have that entry redirect to lib.lsl_push_sample_buftp. I don't know how to best handle the extra argument; it might require passing a ..., *args.

cboulay commented 2 years ago

Even with the above change, I suspect you'll still encounter some difficulties on the receiving end. For example, xdf importers may attempt to coerce the blob into a string and will do the null termination... I haven't looked too closely at that.