meejah / txtorcon

Twisted-based asynchronous Tor control protocol implementation. Includes unit-tests, examples, state-tracking code and configuration abstraction.
http://fjblvrw2jrxnhtg67qpbzi45r7ofojaoo3orzykesly2j3c2m3htapid.onion/
MIT License
250 stars 72 forks source link

txtorcon > 0.19.0 is returning unicode instead of strings when calling TorProtocol.get_info() #232

Open hellais opened 7 years ago

hellais commented 7 years ago

Since 0.19.0 get_info commands will return unicode values instead of strings. It could be that other people would also be a bit surprised by this fact, so at the very leasts some note should be added to the changelog to mention this or ideally (if there not a good reason to do so) restore the old behavior of returning strings.

Here is a minimal way to reproduce the issue:

import txtorcon
config = txtorcon.TorConfig()
def updates(prog, tag, summary):
    print "%d%%: %s" % (prog, summary)
proto = yield txtorcon.launch_tor(config, reactor, progress_updates=updates)
s = txtorcon.TorState(proto.tor_protocol)
state = yield s.post_bootstrap
result = yield state.protocol.get_info("address")
assert isinstance(result.values()[0], str)

The above snippet will succeed in txtorcon 0.18.0, while fail in txtorcon >= 0.19.0

meejah commented 7 years ago

I'm not sure what we can/should do here?

In Python3, str is basically the same as unicode in Python2: a string with some encoding. We're just assuming everything is UTF8 because Tor doesn't declare encodings for anything, but does allow high-ASCII values in "strings" that come back ...

meejah commented 7 years ago

The reason I think py3-str or unicode is the right answer for GETINFO (and most other things in tor-control-protocol) is because they're "things that get shown to users, or typed in by users" so they should be "a string with an encoding", not just raw bytes.