martius-lab / cluster_utils

https://cluster-utils.readthedocs.io/stable/
Other
12 stars 0 forks source link

Use asyncio instead of pyuv #108

Closed luator closed 4 months ago

luator commented 5 months ago

Python's asyncio provides everything we need to set up a simple UDP server. Use it instead of pyuv to get rid of an unnecessary third-party dependency.

This also has the benefit that we avoid a direct dependency on a git repository, which is blocking us from publishing cluster_utils to PyPI.

Two additional minor changes:

Fixes #80

I'll do a bit more testing but so far it looks good, so I think it's already ready for review.

luator commented 5 months ago

I tested by running some of the examples. It seems to work well but hard to tell for sure. Do you know some way to test more systematically?

I also think that UDP is not ideal. To my understanding, it can happen that packets are lost and we wouldn't know. I don't know how big of a problem this actually is in practice, though. Anyway, if it's possible to switch easily, I'd prefer some more reliable protocol. I'll try to understand what changes would be required to switch to TCP.

File-based communication might be tricky with concurrency, so I'd only try this if there is a reliable library which handles this (sqlite might be an option, not sure how well it performs if there is lots of concurrent write access). It also has the disadvantage that the server needs to poll for updates in this case, but that's maybe a minor issue.

luator commented 5 months ago

I changed the implementation to use TCP now (based on example from asyncio documentation). At least locally it was working, will test on the cluster now.

luator commented 5 months ago

Galvani seems to be overloaded at the moment (>800 jobs pending) so I can't test there but I did a runs with example scripts on the MPI cluster and everything seems to work.

luator commented 5 months ago

Now finally managed to test on Galvani as well. A run with 1000 jobs passed without failures.

luator commented 4 months ago

Note: Last pushes just rebased on master and swapped last two commits (so I can more easily switch between TCP and UDP for testing), so no actual code changes to review.

luator commented 4 months ago

I dropped the last commit to go back to UDP as discussed above.