vdloo / consul-kv

Python 3 client for the Consul key/value store
MIT License
12 stars 11 forks source link

Crashes on getting non-text non-UTF8 values #30

Open ulidtko opened 1 month ago

ulidtko commented 1 month ago

Consul KV does not guarantee that values are text at all. (Even less so, in any particular encoding, e.g. UTF-8)

Steps to reproduce

~> export CONSUL_HTTP_ADDR=...
~>
~> printf '\x00\x01\x02\x03\x04\x05\xde' > nontext.data
~> curl -X PUT $CONSUL_HTTP_ADDR/v1/kv/nontext-test --data-binary @nontext.data
true⏎
~> curl -X GET $CONSUL_HTTP_ADDR/v1/kv/nontext-test
[{"LockIndex":0,"Key":"nontext-test","Flags":0,"Value":"AAECAwQF3g==","CreateIndex":13723496,"ModifyIndex":13724362}]⏎
~>
~> echo 'AAECAwQF3g==' | base64 -d | hexdump -C
00000000  00 01 02 03 04 05 de                              |.......|
00000007

:arrow_up: using curl with Consul's HTTP API directly, we've written the binary data byte array [0, 1, 2, 3, 4, 5, 222] to a key /nontext-test, and read it back. This is as expected :green_circle: :ok_hand:

Notice it's not text data! And that's completely fine, Consul KV supports that.

Yet, the python library crashes :boom: if we try to read it:

>>> import consul_kv, os
>>> 
>>> c = consul_kv.Connection(endpoint=os.getenv('CONSUL_HTTP_ADDR') + '/v1')
>>> 
>>> c.get('nontext-test')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/tmp/venv/lib/python3.10/site-packages/consul_kv/__init__.py", line 94, in get
    return get_kv(
  File "/tmp/venv/lib/python3.10/site-packages/consul_kv/api.py", line 135, in get_kv_raw
    return postprocessor(result)
  File "/tmp/venv/lib/python3.10/site-packages/consul_kv/api.py", line 148, in <lambda>
    lambda x: {
  File "/tmp/venv/lib/python3.10/site-packages/consul_kv/api.py", line 151, in <dictcomp>
    r['Key']: b64decode(r['Value']).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xde in position 6: unexpected end of data

This code is wrong: https://github.com/vdloo/consul-kv/blob/7beb2c1c84f87ff81bb010ebe49b939d639eccfe/consul_kv/api.py#L151

because it explicitly assumes that the b64-encoded data is always utf-8 text — while Consul provides no such guarantee.

Docs: https://developer.hashicorp.com/consul/api-docs/kv

[Response] Value is a base64-encoded blob of data.

vdloo commented 3 weeks ago

hi @ulidtko this project is kind of abandoned. I don't think anybody is using it at this point in time, so if you want to fix this my suggestion would be to fork

ulidtko commented 3 weeks ago

Hey @vdloo, yes I could see that, thanks for responding :smiley:

I don't think anybody is using it at this point in time

Oh boy, how you're wrong here :sweat_smile: I won't publically tell much, but it is very much used, in production systems. We hit this bug there.

Not sure how you'd like to proceed; I'd be completely happy if you decide on adding an EOL notice to README and archiving the repo. Then I could tell teams to stop using it in production.

Fixing the bug, or recommending another consul api wrapper library, are also fine options. Just please, deprecate the package if you don't maintain the code :pray:

vdloo commented 3 weeks ago

I am surprised to hear that, but I am glad it ended up being useful for somebody. Because your issue was well described I have created this pull request here which adds a decode_utf8 parameter to the get methods: https://github.com/vdloo/consul-kv/pull/31. If that works for you I'll merge it.

The code-base is small so if this is an important project for you I would say your best bet is still to fork the code and perhaps maintain it yourself. Alternatives can be found here: https://developer.hashicorp.com/consul/api-docs/libraries-and-sdks.