websockets 0 terminated on send but newline terminated on receive

simonvanderveldt commented 4 years ago

Whilst working with the websockets for testing some setups for #1067 I noticed that the messages sent by norns through the websocket seem to be 0 (zero) terminated but if I want to execute some code (like norns.script.load()) it only works when I send newline terminated messages.

I'm not sure if there's a reason for this?

If not, can we pick one approach an use it for both directions?

csboling commented 4 years ago

@simonvanderveldt can you give more details on how you're seeing this? My test script:

import asyncio
import websockets

ip = 'norns.local'
port = '5555'

async def tx():
    uri = 'ws://{}:{}/'.format(ip, port)
    print('connecting...', end='')
    async with websockets.connect(uri, subprotocols=['bus.sp.nanomsg.org']) as websocket:
        print('ok.')
        while True:
            msg = input()
            await websocket.send(msg + '\n')
            response = await websocket.recv()
            print(repr(response))

asyncio.get_event_loop().run_until_complete(tx())

and output:

$ python norns_ws.py
connecting...ok.
print(2)
'2\n'
print(2)
'<ok>\n\n'
print(2)
'2\n<ok>\n\n'

tehn commented 4 years ago

@simonvanderveldt is this still a problem? trying to clean up the issue list.

catfact commented 4 years ago

i'm not convinced this is really a problem. it's not that the socket RX terminates on newline; it's that the lua interpreter evaluates a code chunk on newline, which seems like correct behavior to me. (what else would it do?)

simonvanderveldt commented 4 years ago

@simonvanderveldt is this still a problem? trying to clean up the issue list.

@tehn I don't think anything was changed/fixed so probably yes. Because this didn't work I'm no longer running a single process that both receives and sends data so it's no longer relevant for me.

i'm not convinced this is really a problem. it's not that the socket RX terminates on newline; it's that the lua interpreter evaluates a code chunk on newline, which seems like correct behavior to me. (what else would it do?)

@catfact If the lines that are being parsed by lua need to be newline terminated we could make the data that's sent from norns also be newline terminated? I don't know why it's zero terminated at the moment?

catfact commented 4 years ago

ok, i guess my followup is - when are you seeing messages without newline? AFAIK, the websocket does not add actually newlines in either direction. what we see from matron is usually lua print() output, and this includes newlines (as witnessed in snippet above.) if there are debug prints they are usually fprintf(stderr, "...\n");.

simonvanderveldt commented 4 years ago

Well, (it's been a long time ago) basically I got everything back without newlines.

My main question is why are we using different ways of doing this for incoming and outgoing data?

catfact commented 4 years ago

ok, you mean this: https://github.com/monome/norns/blob/main/ws-wrapper/src/main.c#L68

we terminated TX strings with a zero (following a newline) because (IIRC) it was necessary for some (most?) websocket clients to recognize them.

do we have an issue with handling zero-terminated strings on RX?

simonvanderveldt commented 4 years ago

do we have an issue with handling zero-terminated strings on RX?

Following on from your answer before I think it would mean some custom handling to trigger parsing of a line in Lua? Since that now happens "for free" on /n.

catfact commented 4 years ago

sorry, i was unclear both in my expression and in my thinking.

what i meant is: the lua interpreter requires \n (or ;) to terminate a statement. i believe, but have not verified, that a null-terminated string (e.g. ending in \n\0) received by the websocket will be passed in full to the interpreter, and it will evaluate everything up to and including the \n, and the \0 will simply be ignored in that packet.

(in other words, the system is intended to handle strings in both directions that contain \n\0. if we lose the null-termination on TX, i believe we will see a resurfacing of old problems.)

is this not the case? did you run into an issue trying to use null-terminated strings bidirectionally?

if so, i guess we should add logic to the WS wrapper RX loop, to strip \0 if needed. (i imagine that the symptom would be something like: the null is considered as the first byte in the following packet, leading to discarded input. but visual review of the code does not lead me to expect this.)

catfact commented 4 years ago

i guess from this:

if I want to execute some code (like norns.script.load()) it only works when I send newline terminated messages.

that you tried norns.script.load()\n\0 and it did not work?

ngwese commented 4 years ago

I don’t believe output from matron to ws-wrapper to client should have a new line injected by ws-wrapper if that is what is being proposed.

Short of the thought that terminating the outgoing string with \0 might be a quirk specific to nanomsg I don’t honestly understand what is wrong with the current behavior. I don’t recall off the top of my head having to deal with the null terminating byte in the maiden (web) code. The web code is just accumulating the output until a new line is reached then injecting that into the REPL area as another line of output.

simonvanderveldt commented 4 years ago

I don’t recall off the top of my head having to deal with the null terminating byte in the maiden (web) code. The web code is just accumulating the output until a new line is reached then injecting that into the REPL area as another line of output.

Then there must be some magic happening somewhere ;) since this results in lines ending up on a single line for me.

You can easily check this using websocat and loading a script

$ websocat --protocol bus.sp.nanomsg.org ws://norns.localdomain:5555 
# script load: /home/we/dust/code/awake/awake.lua
# cleanup # script clear
including /home/we/dust/code/awake/lib/halfsecond.lua
including /home/we/dust/code/awake/lib/beatclock-crow.lua
pset >> write: /home/we/dust/data/system.pset
# script run
loading engine: PolyPerc
>> reading PMAP /home/we/dust/data/awake/awake.pmap
m.read: /home/we/dust/data/awake/awake.pmap not read.
Engine.register_commands; count: 7
___ engine commands ___ amp     f cutoff        f gain      f hz        f pan       f pw        f
release     f
___ polls ___ amp_in_l
amp_in_r amp_out_l amp_out_r
cpu_avg cpu_peak pitch_in_l
pitch_in_r
# script init
starting halfsecond
pset >> read: /home/we/dust/data/awake/awake-01.pset
pset :: /home/we/dust/data/awake/awake-01.pset not read.

vs

$ websocat -0 --protocol bus.sp.nanomsg.org ws://norns.localdomain:5555
# script load: /home/we/dust/code/awake/awake.lua
# cleanup
# script clear
including /home/we/dust/code/awake/lib/halfsecond.lua
including /home/we/dust/code/awake/lib/beatclock-crow.lua
pset >> write: /home/we/dust/data/system.pset
# script run
loading engine: PolyPerc
>> reading PMAP /home/we/dust/data/awake/awake.pmap
m.read: /home/we/dust/data/awake/awake.pmap not read.
Engine.register_commands; count: 7
___ engine commands ___
amp     f
cutoff      f
gain        f
hz      f
pan     f
pw      f
release     f
___ polls ___
amp_in_l
amp_in_r
amp_out_l
amp_out_r
cpu_avg
cpu_peak
pitch_in_l
pitch_in_r
# script init
starting halfsecond
pset >> read: /home/we/dust/data/awake/awake-01.pset
pset :: /home/we/dust/data/awake/awake-01.pset not read.

simonvanderveldt commented 4 years ago

that you tried norns.script.load()\n\0 and it did not work?

I think I only tested norns.script.load()\0 but I can test \n\0. But tbh that still seems odd to me, it would be technically zero terminated I guess but with the caveat that whatever it is you're sending also still needs to be newline terminated to actually work/trigger evaluation.

catfact commented 4 years ago

well, whatever you're sending to lua needs to have a newline or a semicolon to terminate a statement. that's not our decision, it's the semantics of the lua language. print('hello')\n\0 should work, and so should print('hello');print('again');\0.

we are intending to use null-terminated strings in both directions. clunky old C programs need this, and more modern environments can generally handle it without trouble. we should also be able to handle incoming strings that are not \0-terminated, just in case this is a painful requirement for a client.

if RX on null-terminated strings is really busted, then that's a bug - let's fix it and move on.

i will test this myself right now but am having some hurdles installing websocat and its dependencies on this particular machine.

catfact commented 4 years ago

for completeness: it should also work to send these separately: print('hello'\0 );\0

catfact commented 4 years ago

ok, i got websocat working and took a deeper dive. here are some notes on how things work now. happy to change things as long as are all on the same page regarding what needs to be supported...

1. websocat

(all invocations of websocat also use -protocol bus.sp.nanomsg.org)

with no argument (newline-terminated mode, i guess), i see the same behavior as simon:
- statements sent to norns are evaluated correctly
- output from norns: newlines are consumed by websocat because it thinks they are terminators i guess. though not ideal, this does seem like expected behavior.
with websocat -0 or websocat --null-terminated,
- output from norns looks correct
- sending statements to norns doesn't work at all. this seems wrong.

i'm not sure what is going on with the failure to send in null-terminated mode, so digging some more:

2. ws-wrapper/main.c

we are not doing anything special with RX buffers, just passing them along. i tried adding these lines to the RX loop:

void *loop_rx(void *p) {
    (void)p;
    int nb;
    while (1) {
        char *buf = NULL;
        nb = nn_recv(sock_ws, &buf, NN_MSG, 0);

        ///---------------
        // yell whenever a buffer is received
        fprintf(stderr, "rx\n");
        // attempt to strip terminating null if present
        if (nb > 0 && buf[nb-1] == '\0') {
            fprintf(stderr, "found null terminator\n");
            nb = nb - 1;
        }
       //----------------

        if (write(pipe_rx[PIPE_WRITE], buf, nb) < 0) {
            fprintf(stderr, "write to pipe failed\n");
        }
        nn_freemsg(buf);
    }
}

weirdly, nanomsg doesn't fire the "rx" message at all with websocat -0. this seems really bizarre. tempted to roll something lower-level because i don't know exactly what websocat is doing under the hood.

moving on, for informational purposes:

3. matron/src/input.c

ok, this is admittedly a little odd. we are actually explicitly looking for newlines from stdin: https://github.com/monome/norns/blob/main/matron/src/input.c#L27

and we are also actually explicitly skipping null bytes: https://github.com/monome/norns/blob/main/matron/src/input.c#L34-L35

and then adding a null byte to ensure a legal C string before passing the input chunk through the event loop to the LVM: https://github.com/monome/norns/blob/main/matron/src/input.c#L52-L57

this is all kinda weird-looking, but it does seem to accomplish the intended job: handle input as if someone were typing it into stdin:

whether a chunk is complete or not, we don't want to constantly attempt to re-evaluate the RX buffer on every received character, so we do need to treat lua statement delimiters specially (i think)
i do think we should handle semicolons as newlines.
i wonder if skipping the null bytes is really correct.

in any case, the weird part is in 2. i'm missing something about how nanomsg works maybe.

need to move on to other tasks but maybe wiser heads will see what i am missing.

catfact commented 4 years ago

oh, right: tried various other things in 2, like just replacing each \0 with \n. the weird thing is that nn_recv just seems to block forever with websocat -0 --protocol bus.sp.nanomsg.org, so our handling of the buffer contents is a moot point.

simonvanderveldt commented 4 years ago

well, whatever you're sending to lua needs to have a newline or a semicolon to terminate a statement. that's not our decision, it's the semantics of the lua language. print('hello')\n\0 should work, and so should print('hello');print('again');\0.

A small question about this: We define the interface, right? So we can say the norns interface requires zero terminated strings as input and when we parse those we pass them to Lua in a way that Lua expects it. Strictly speaking we aren't tied to what Lua requires, that's just an implementation detail.

Now if we are saying we're exposing regular Lua over this interface (I'm not sure we've ever said that? We made our own REPL/tools for livecoding which is what users would use but I guess we never properly defined the underlying interface itself?) it's of course a different story.

Regarding point 2: You're saying that ws-wrapper doesn't show any incoming messages when using websocat's null terminated mode? I can check if I see the same behavior.

catfact commented 4 years ago

i guess we never stated this explicitly, but yeah: i intended to make the input to matron work just like the input to the canonical lua program, because that seems like the least surprising behavior. (incomplete statements are fine; broken string literals are not; newline triggers evaluation. allowing semicolon to trigger evaluation would actually be a departure, probably bad idea on 2nd thought.)

point 2: yeah, that's what i'm seeing. currently rolling a lower-level test client to see if i can pin it down further. i would like to be able to just say: we want null-terminated strings in both directions.

another option is to remove the null terminator in TX, and say we don't support any client program that needs C strings. i'm ok with this if it is generally considered more useful. (would need some modifications to maiden-repl probably.)

catfact commented 4 years ago

oh, i also tried setting NN_WS_MSG_TYPE_BINARY instead of NN_WS_MSG_TYPE_TEXT, in case frames with null bytes were being rejected as non-text. no dice. (in fact, no observable change in behavior: weird.)

part of me is tempted to try switching from nanomsg to boost::beast or something.

simonvanderveldt commented 4 years ago

point 2: yeah, that's what i'm seeing. currently rolling a lower-level test client to see if i can pin it down further. i would like to be able to just say: we want null-terminated strings in both directions.

I'm seeing the same. But I am seeing the null terminated messages when sending zero terminated strings via Python, so might be a websocat issue?

i guess we never stated this explicitly, but yeah: i intended to make the input to matron work just like the input to the canonical lua program, because that seems like the least surprising behavior. (incomplete statements are fine; broken string literals are not; newline triggers evaluation. allowing semicolon to trigger evaluation would actually be a departure, probably bad idea on 2nd thought.)

another option is to remove the null terminator in TX, and say we don't support any client program that needs C strings. i'm ok with this if it is generally considered more useful. (would need some modifications to maiden-repl probably.)

Both input and output newline terminated would make it more like a (Lua) REPL I think? That's generally how REPLs work (although there are of course nuances to this as well, for example io.write("foo") doesn't output with a newline :x)

monome / norns