project-receptor / python-receptor

Project Receptor is a flexible multi-service relayer with remote execution and orchestration capabilities linking controllers with executors across a mesh of nodes.
Other
32 stars 21 forks source link

Switch memory backed buffers to file backed buffers #130

Closed matburt closed 4 years ago

matburt commented 4 years ago

This represents work toward #121

This PR replaces #84

ansible-zuul[bot] commented 4 years ago

Build failed.

ansible-zuul[bot] commented 4 years ago

Build failed.

ansible-zuul[bot] commented 4 years ago

Build failed.

ansible-zuul[bot] commented 4 years ago

Build failed.

ansible-zuul[bot] commented 4 years ago

Build failed.

ansible-zuul[bot] commented 4 years ago

Build failed.

ansible-zuul[bot] commented 4 years ago

Build failed.

ansible-zuul[bot] commented 4 years ago

Build failed.

ansible-zuul[bot] commented 4 years ago

Build failed.

ansible-zuul[bot] commented 4 years ago

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.

ansible-zuul[bot] commented 4 years ago

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.

ansible-zuul[bot] commented 4 years ago

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.

ansible-zuul[bot] commented 4 years ago

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. Warning: Failed to create check run ansible/check: 403 Resource not accessible by integration

elyezer commented 4 years ago

I was trying to run these changes locally and I am facing the following. The example is with ping command but it also happens when running in either controller or node modes.

$ poetry run receptor \
    -d /tmp/ping-a ping \
    --peer=receptor://127.0.0.1:9999 \
    --delay 0 --count 10 node-a
/home/elyezer/code/receptor/receptor/receptor/connection/sock.py:30: RuntimeWarning: coroutine 'BridgeQueue.__aiter__' was never awaited
  async for chunk in q:
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
ERROR 2020-03-06 14:42:23,757  sock sock.connect
Traceback (most recent call last):
  File "/home/elyezer/code/receptor/receptor/receptor/connection/sock.py", line 44, in connect
    await worker.client(t)
  File "/home/elyezer/code/receptor/receptor/receptor/connection/base.py", line 195, in client
    await self.hello()
  File "/home/elyezer/code/receptor/receptor/receptor/connection/base.py", line 173, in hello
    await self.conn.send(BridgeQueue.one(msg))
  File "/home/elyezer/code/receptor/receptor/receptor/connection/sock.py", line 30, in send
    async for chunk in q:
TypeError: 'async for' received an object from __aiter__ that does not implement __anext__: coroutine
Connection failed. Exiting.
jhjaggars commented 4 years ago

I was trying to run these changes locally and I am facing the following. The example is with ping command but it also happens when running in either controller or node modes.

$ poetry run receptor \
    -d /tmp/ping-a ping \
    --peer=receptor://127.0.0.1:9999 \
    --delay 0 --count 10 node-a
/home/elyezer/code/receptor/receptor/receptor/connection/sock.py:30: RuntimeWarning: coroutine 'BridgeQueue.__aiter__' was never awaited
  async for chunk in q:
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
ERROR 2020-03-06 14:42:23,757  sock sock.connect
Traceback (most recent call last):
  File "/home/elyezer/code/receptor/receptor/receptor/connection/sock.py", line 44, in connect
    await worker.client(t)
  File "/home/elyezer/code/receptor/receptor/receptor/connection/base.py", line 195, in client
    await self.hello()
  File "/home/elyezer/code/receptor/receptor/receptor/connection/base.py", line 173, in hello
    await self.conn.send(BridgeQueue.one(msg))
  File "/home/elyezer/code/receptor/receptor/receptor/connection/sock.py", line 30, in send
    async for chunk in q:
TypeError: 'async for' received an object from __aiter__ that does not implement __anext__: coroutine
Connection failed. Exiting.

That's because the protocol for __aiter__ changed between 3.6 and 3.7 and I goofed it up. This should be fixed in the latest commit.

ghjm commented 4 years ago

I'm getting the following exception on starting up a node with this branch:

Task exception was never retrieved
future: <Task finished coro=<Receptor.watch_expire() done, defined at /home/graham/git/receptor/receptor/receptor.py:78> exception=TypeError('an integer is required (got type str)')>
Traceback (most recent call last):
  File "/home/graham/git/receptor/receptor/receptor.py", line 82, in watch_expire
    buffer = self.buffer_mgr[connection["id"]]
  File "/home/graham/git/receptor/receptor/buffers/file.py", line 139, in __missing__
    self[key] = DurableBuffer(self.path, key, self.loop)
  File "/home/graham/git/receptor/receptor/buffers/file.py", line 30, in __init__
    for item in self._read_manifest():
  File "/home/graham/git/receptor/receptor/buffers/file.py", line 64, in _read_manifest
    return json.load(fp)
  File "/usr/lib64/python3.7/json/__init__.py", line 296, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib64/python3.7/json/__init__.py", line 361, in loads
    return cls(**kw).decode(s)
  File "/usr/lib64/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "/home/graham/git/receptor/receptor/serde.py", line 19, in decode_date
    return datetime.datetime.fromtimestamp(o["value"])
TypeError: an integer is required (got type str)
ghjm commented 4 years ago

Looks like this changes the required worker format, so receptor-http and receptor-sleep (as I have them) are not working:

$ receptor send --directive receptor_http:execute foo '{"url": "http://localhost:9000", "method": "GET"}'
'bytes' object has no attribute 'raw_payload'
ERROR 2020-03-10 17:18:53,306  entrypoints EOF was an error result
ERROR: 'bytes' object has no attribute 'raw_payload'
receptor send --directive receptor_sleep:execute foo ''
'NoneType' object has no attribute 'readall'
ERROR 2020-03-10 17:19:33,029  entrypoints EOF was an error result
ERROR: 'NoneType' object has no attribute 'readall'
$ receptor send --directive receptor_sleep:execute foo '{}'
'bytes' object has no attribute 'raw_payload'
ERROR 2020-03-10 17:19:42,695  entrypoints EOF was an error result
ERROR: 'bytes' object has no attribute 'raw_payload'

So the only thing I can do with this is ping, but ping isn't working, apparently because I broke it. See https://github.com/project-receptor/receptor/pull/168.

matburt commented 4 years ago

I'm getting the following exception on starting up a node with this branch

@ghjm you'll need to clear your data directory, it's picking up old manifest information before the timestamp change.

ghjm commented 4 years ago

I am seeing performance problems with ping. On the current devel branch I get:

$ time receptor ping foo --delay 0 --count 1000 > /dev/null

real    0m3.106s
user    0m0.941s
sys     0m2.704s

On this branch, I get:

$ time receptor ping foo --delay 0 --count 1000 > /dev/null

real    2m8.724s
user    0m0.840s
sys     0m1.838s
jhjaggars commented 4 years ago

I am seeing performance problems with ping. On the current devel branch I get:

$ time receptor ping foo --delay 0 --count 1000 > /dev/null

real    0m3.106s
user    0m0.941s
sys     0m2.704s

On this branch, I get:

$ time receptor ping foo --delay 0 --count 1000 > /dev/null

real    2m8.724s
user    0m0.840s
sys     0m1.838s

This is fixed in https://github.com/project-receptor/receptor/pull/130/commits/aa1188bf454f3c494b4695e9f3fd9795196464e0