python-zk / kazoo

Kazoo is a high-level Python library that makes it easier to use Apache Zookeeper.
https://kazoo.readthedocs.io
Apache License 2.0
1.3k stars 386 forks source link

Commiting large transaction results in ConnectionLoss #630

Closed joshuaboniface closed 3 years ago

joshuaboniface commented 3 years ago

Expected Behavior

Committing a large transaction should succeed, or at least provide a descriptive error (max size?)

Actual Behavior

Committing a large transaction fails with a kazoo.exceptions.ConnectionLoss:

  File "/var/home/joshua/Projects/pvc/api-daemon/pvcapid/helper.py", line 151, in restore_cluster
    results = zk_transaction.commit()
  File "/usr/lib/python3/dist-packages/kazoo/client.py", line 1563, in commit
    return self.commit_async().get()
  File "/usr/lib/python3/dist-packages/kazoo/handlers/utils.py", line 79, in get
    raise self._exception
kazoo.exceptions.ConnectionLoss

Snippet to Reproduce the Problem

Consider an example Python dict (loaded from JSON) like the following:

mydict = {
    "key1": "Some quite long string",
    "key2": "Some quite long string",
    [etc.]
}

Such that the total number of keys is >=~3200 and the JSON representation of this on-disk is >928935 bytes but <=932093 bytes - I was tweaking this in fairly coarse steps using 3 layers of keys, but the counts are accurate.

Now, we craft the following code:

zk_conn = kazoo.client.KazooClient(hosts='localhost')
zk_conn.start()
transaction = zk_conn.transaction()
for key in mydict:
    print(key)
    data = mydict[key]
    transaction.create(key, str(data).encode('utf8'))
print('committing')
transaction.commit()

With 3237 identical keys, each containing 254 characters of text, totaling 928935 bytes in JSON representation, the commit succeeds:

/key2/child93/subchild7
3235
/key2/child93/subchild8
3236
/key2/child93/subchild9
3237
committing

With 3248 identical keys, each containing the same text, totaling 932093 bytes in JSON, the commit fails with the timeout:

/key2/child94/subchild7
3246                   
/key2/child94/subchild8
3247                   
/key2/child94/subchild9
3248                   
committing             
Connection dropped: socket connection broken
Transition to CONNECTING
[2020-11-24 02:07:24,845] ERROR in app: Exception on /api/v1/restore [POST]
Traceback (most recent call last):
[...]
  File "/usr/lib/python3/dist-packages/kazoo/client.py", line 1563, in commit
    return self.commit_async().get()
  File "/usr/lib/python3/dist-packages/kazoo/handlers/utils.py", line 79, in get
    raise self._exception
kazoo.exceptions.ConnectionLoss

My original test which triggered this was a significantly larger dataset, about 5.2MB of JSON representation. My example was able to hit the limit with far less.

I suspect we're hitting some sort of single-transaction limit in the client side, given that nothing appears in the server Debug log, but cannot be certain. If this is a hard limit, I didn't see it in the documentation, and if not it seems like a bug.

Logs with logging in DEBUG mode

No logs are present on the Zookeeper server side when this fails; the ConnectionLoss seems to indicate the connection never proceeds properly.

Specifications

joshuaboniface commented 3 years ago

I did some more digging and it seems like I'm hitting a Zookeeper limit, rather than anything wrong in the code here. The final commit data is just over 1,000,000 bytes with by bad example, and just under with my good example, which makes me suspect this is a limit imposed by Zookeeper, and not a bug in Kazoo. Keeping this open until I know for certain so I can document this for future reference.

StephenSorriaux commented 3 years ago

Hello,

May this be related to jute.maxBuffer param for Zookeeper, which would be, server-side, 4MB in your version?

I guess this issue may help.

joshuaboniface commented 3 years ago

Good find, thanks! It seems like whatever limit I'm hitting, it's at a request_size of just over 1024*1024 (bytes I presume), so I'm wondering if this got defaulted to 1MB somehow in the Debian package build. I'll try experimenting with this option. Worst case, since it seems like this only happens (at least with my sample set) for create transaction tasks and not set_data tasks, I can also work around it there, but I prefer the whole thing to be atomic.

For reference, before anyone thinks I'm too crazy, the purpose here is to backup and restore a Zookeeper-using program's data. The restore is where the trouble here lies, since I'm trying to take the whole backup and restore it in a single atomic action. I don' think I necessarily have to do this, and could do the writes one-at-a-time if needed, but I like the idea of it being transactional a little more.

joshuaboniface commented 3 years ago

It does seem like setting that worked. Double-checking the docs the default was actually 0xfffff, or 1024*1024, so it matched up exactly. Increasing it to 16*1024*1024 (0xffffff) seems to have done the trick. Weird that it considers this a single znode, despite me creating many znodes, but I'm not going to question it too deeply - in my usecase, 5MB seems like a reasonable amount of data in a restore, but exceeding 16MB is unlikely. Thanks!

TL;DR for anyone finding this: Set -Djute.maxbuffer=0xffffff as a Java option to the Zookeeper process. This is considered unsafe but I assume fine if you're not actually trying to store 16MB of data in a single znode. Ref: http://zookeeper-user.578899.n2.nabble.com/Configuring-jute-maxbuffer-on-ZooKeeper-Nodes-td7583886.html

ceache commented 3 years ago

The reason why is zookeeper stores in the znode both it's data AND it's metadata (type, owner, permissions and list of child znodes).

I think this accounts for the difference from 1MB you were seeing.

Cheers

On Tue, Nov 24, 2020, 11:06 Joshua M. Boniface notifications@github.com wrote:

Closed #630 https://github.com/python-zk/kazoo/issues/630.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/python-zk/kazoo/issues/630#event-4033913090, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIFTHVWOZRXHEDGNEGUW7LSRPRYZANCNFSM4UAPWH4A .

joshuaboniface commented 3 years ago

@ceache Interesting, I've never hit the issue when creating the znodes in the first place - it was only this specific situation where I was restoring what is effectively a large set of keys (and all their children) in a single shot. Referencing the context, a whole cluster restore: creating each individual entry, even a few children deep, never seemed to cause an issue, just this big restore, and only when creating them as new znodes. Could the limit have been on the / root znode, and just never hit because I was adding children slowly rather than all at once?