snare / voltron

A hacky debugger UI for hackers
MIT License
6.18k stars 414 forks source link

UnicodeDecodeError with command `frame variable` #207

Open j-delaney opened 7 years ago

j-delaney commented 7 years ago

Summary

In some cases, calling frame variables from Voltron causing the following stacktrace:

Traceback (most recent call last):
  File "<python>/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "<python>/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "<python>/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "<python>/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "<python>/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "<python>/site-packages/voltron/core.py", line 86, in api_post
    res = self.server.handle_request(request.data.decode('UTF-8'))
  File "<python>/site-packages/voltron/core.py", line 241, in handle_request
    res = self.dispatch_request(req)
  File "<python>/site-packages/voltron/core.py", line 297, in dispatch_request
    log.debug("Response: {}".format(str(res)))
  File "<python>/site-packages/voltron/api.py", line 193, in __str__
    return self.to_json()
  File "<python>/site-packages/voltron/api.py", line 234, in to_json
    return json.dumps(self.to_dict())
  File "<python>/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "<python>/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "<python>/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf7 in position 147: invalid start byte

The line having the issues looks like

return json.dumps(self.to_dict())

I did a little bit of digging and it looks like the issue might be that the default encoding for json.dumps is "utf-8" but in this particular case it needs to be "iso-8859-1". I've found a temporary workaround of changing the offending line to:

return json.dumps(self.to_dict(), ensure_ascii=False)

Steps to Reproduce

  1. Create a file char.cpp with the following contents:
    
    struct s { char arr[2]; };

int main() { struct s foo = {{(char)247, '\0'}}; return 0; }


2. Compile with `clang++ --debug char.cpp` to generate `a.out`
3. Start up the debugger with `lldb a.out`
4. Set a breakpoint right before the program finished with `breakpoint set -l 5`
5. Launch the program with `process launch`
6. In a new window view the frame variables with voltron by running `voltron view command "frame variable"`. This will cause the error.

### Expected results
Step 6 should not error.

### Environment

clang++: Apple LLVM version 8.1.0 (clang-802.0.42) python: Python 2.7.10 :: Anaconda custom (x86_64) voltron: 0.1.7 voltron-web: 0.1.1 lldb: lldb-370.0.42 os: macOS 10.12.5

j-delaney commented 7 years ago

For an example that's easier to mess around with you can also use the Python REPL:

$ python
>>> import json
>>> dict = {'status': 'success', 'data': {'output': '(s) foo = (arr = "\xf7)'}, 'type': 'response'}
>>> json.dumps(dict)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "python2.7/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf7 in position 18: invalid start byte
mollstam commented 6 years ago

I ran into this same issue and used your workaround as well. Thanks! :tada:

The following command output can't be properly decoded:

dispatch_request -- Response: {"status": "success", "data": {"output": "(i32) a = 0\n(const char *) path = 0x0000000000024580 \"UH\\x89H\u0005\""}, "type": "response"}

which causes UnicodeDecodeError: 'utf8' codec can't decode byte 0xe5 in position 60: invalid continuation byte.

The code I'm debugging is like

{
    int a = 0;
    const char* path = "foo";
}

and if I set a breakpoint on the first line "frame variable" in lldb will output

(i32) a = 0
(const char *) path = 0x0000000000024580 "UH\x89H"  

due to the fact that path hasn't been initialized yet and is just garbled memory, which voltron tries to make python decode as utf8.