runtimeverification / haskell-backend

The symbolic execution engine powering the K Framework
BSD 3-Clause "New" or "Revised" License
208 stars 41 forks source link

`kore-rpc` returns error response on request with Unicode character #3563

Open tothtamas28 opened 1 year ago

tothtamas28 commented 1 year ago

For a request that contains Unicode characters within a string (either escaped or not), kore-rpc returns error -32602 Invalid params.

Steps to reproduce

string-rewrite.k request.json

$ kompile string-rewrite.k --backend haskell
$ kore-rpc string-rewrite-kompiled/definition.kore --module STRING-REWRITE --server-port 3000
$ netcat -q 0 localhost 3000 < request.json
{"jsonrpc":"2.0","id":1,"error":{"code":-32602,"data":{"state":{"format":"KORE","version":1,"term":{"tag":"App","name":"Lbl'-LT-'generatedTop'-GT-'","sorts":[],"args":[{"tag":"App","name":"Lbl'-LT-'k'-GT-'","sorts":[],"args":[{"tag":"App","name":"kseq","sorts":[],"args":[{"tag":"App","name":"inj","sorts":[{"tag":"SortApp","name":"SortString","args":[]},{"tag":"SortApp","name":"SortKItem","args":[]}],"args":[{"tag":"DV","sort":{"tag":"SortApp","name":"SortString","args":[]},"value":"🙂"}]},{"tag":"App","name":"dotk","sorts":[],"args":[]}]}]},{"tag":"App","name":"Lbl'-LT-'generatedCounter'-GT-'","sorts":[],"args":[{"tag":"DV","sort":{"tag":"SortApp","name":"SortInt","args":[]},"value":"0"}]},{"tag":"App","name":"Lbl'-LT-'s'-GT-'","sorts":[],"args":[{"tag":"DV","sort":{"tag":"SortApp","name":"SortString","args":[]},"value":""}]}]}}},"message":"Invalid params"}}

The error is due to

{
  "tag": "DV",
  "sort": {
    "tag": "SortApp",
    "name": "SortString",
    "args": []
  },
  "value": "\ud83d\ude42"
}

on lines 40-47 of request.json. If the value is changed to

"value": "🙂"

the error is still there. Note that both strings are well-formed JSON strings (standard, see section 2.5).

If the value it's changed to

"value": "hello"

the request works as expected:

{"jsonrpc":"2.0","id":1,"result":{"reason":"stuck","depth":1,"state":{"term":{"format":"KORE","version":1,"term":{"tag":"App","name":"Lbl'-LT-'generatedTop'-GT-'","sorts":[],"args":[{"tag":"App","name":"Lbl'-LT-'k'-GT-'","sorts":[],"args":[{"tag":"App","name":"dotk","sorts":[],"args":[]}]},{"tag":"App","name":"Lbl'-LT-'generatedCounter'-GT-'","sorts":[],"args":[{"tag":"DV","sort":{"tag":"SortApp","name":"SortInt","args":[]},"value":"0"}]},{"tag":"App","name":"Lbl'-LT-'s'-GT-'","sorts":[],"args":[{"tag":"DV","sort":{"tag":"SortApp","name":"SortString","args":[]},"value":"hello"}]}]}}}}}
ana-pantilie commented 1 year ago

@tothtamas28 is there any good reason you'd need support for "🙂"? We restricted strings to latin1 characters because of the bytearray encoding issues.

tothtamas28 commented 1 year ago

is there any good reason you'd need support for "🙂"

Textual KORE supports arbitrary Unicode characters in string literals. When designing the KORE JSON representation, we decided not to double-encode, thus both

{
  "tag": "DV",
  "sort": {
    "tag": "SortApp",
    "name": "SortString",
    "args": []
  },
  "value": "\ud83d\ude42"
}

and

{
  "tag": "DV",
  "sort": {
    "tag": "SortApp",
    "name": "SortString",
    "args": []
  },
  "value": "🙂"
}

are valid KORE JSON. The tool scripts/generate-json-examples.sh also generates such input.

To prevent encoding issues between components, I wrote a few integration tests, and some of them are failing because of this: https://github.com/runtimeverification/pyk/pull/333

We restricted strings to latin1 characters

You mean "🙂" should be represented as "\ud83d\ude42"?

because of the bytearray encoding issues

Can you point me to the issue?

ana-pantilie commented 1 year ago

I believe https://github.com/runtimeverification/hs-backend-booster/issues/92 is the issue. @goodlyrottenapple or @jberthold can probably provide better context than I can.

ana-pantilie commented 1 year ago

It has been proven to be very difficult to support ByteArrays as well as Unicode. It also seems that the LLVM backend does not support Unicode, and since the booster depends on frequent communication with the LLVM backend it would be very tricky to implement Unicode just for the Haskell backends.

Unless this is high priority I don't think we have the resources to tackle this issue right now, at least on the Haskell backend side.

ana-pantilie commented 1 year ago

Related to https://github.com/runtimeverification/k/issues/3344