record --to-file having trouble with unicode?

gitonthescene commented 8 months ago

I'll try to debug further, but when I try to use lsp-devtools record --to-file to record a session connecting my LSP server to a client file which contains unicode I get a bunch of errors and the record session seems to effectively bail. The server itself seems to keep running. I've attached a script session of attempting to record, the output and the file. It does use my lsp-server but I suspect that's not relevant.

The file simply contains (∊). Note that the output (test.json) doesn't record past the initialized message though I interacted with it further before shutting it down.

I'm happy to poke around further but it might be easiest to for you to try to reproduce with whatever test server you use than to explain mine. test.json test.script.txt unitest.k.txt

alcarney commented 8 months ago

The issue looks like the record command is struggling to process the messages being captured by the agent.

Failed to handle notification "message/text": MessageText(text='Content-Length: 
162\r\n\r\n{"jsonrpc":"2.0","method":"textDocument/semanticTokens/full","params":{"textDocument":{"uri":"file:///Users/XXXX/K/ngnk/ng
nk-lsp/unitest.k"}},"id":27}\n', timestamp=1709523564.392576, session='df629c07-61ae-43a1-817d-d3ce5d44830b', source='client')
Traceback (most recent call last):
    ...
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 192)

What seems suspicious is the newline at the end of the MessageText payload, I would normally expect the text to end at the final } character...

Content-Length: 162\r\n\r\n{"jsonrpc":"2.0","method":"textDocument/semanticTokens/full","params":{"textDocument":{"uri":"file:///Users/XXXX/K/ngnk/ngnk-lsp/unitest.k"}},"id":27}\n

However that on its own isn't enough to trigger the error... but if I add some extra text I can reproduce it

>>> json.loads('{"jsonrpc":"2.0","method":"textDocument/semanticTokens/full","params":{"textDocument":{"uri":"file:///Users/XXXX/K/ngnk/ngnk-lsp/unitest.k"}},"id":27}\nContent-Length: ')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.12/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/json/decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 151)

I don't know exactly where the issue is coming from, but I have a hunch that the Content-Length header is being calculated incorrectly and is causing the message parsing code to read too many characters from the stream... :thinking:

>>> len('∊')
1
>>> len('∊'.encode())
3

alcarney commented 8 months ago

(Thinking aloud here)

The issue is probably here

    logger.info(
        "%s",
        message.decode("utf8"),
        extra={"source": source},
    )

The message is decoded from bytes to a utf8 string which "shortens" a character like ∊ which invalidates the Content-Length header associated with the message... which is never corrected when the captured message is converted to a MessageText notification

        message = MessageText(
            text=record.args[0],  # type: ignore
            session=self.session,
            timestamp=record.created,
            source=record.__dict__["source"],
        )

What I can't quite explain to myself yet is why the error in your test.script.txt file is for a semantic tokens request (which as far as I know will not contain any unicode) and isn't for a message like textDocument/didOpen which would contain the contents of your file.... :thinking:

gitonthescene commented 8 months ago

I think it's probably because it's a textDocument/semanticTokens/full request which sends the whole document along.

alcarney commented 6 months ago

The fix for this should now be available in v0.2.3 of lsp-devtools. It was a fairly major change compared with how things worked previously, so let me know if you run into any issues :)

gitonthescene commented 6 months ago

Awesome! Will try shortly.

swyddfa / lsp-devtools

record --to-file having trouble with unicode? #157