Closed ubgk closed 1 month ago
It likely follows from https://github.com/upkie/vulp/releases/tag/v2.2.0 where we switched from posix_ipc
to the Python standard library.
I'm surprised your log only attempts once to wait for the spine: in the current PID balancer, it should try 10 times. Maybe try increasing to retries=10
?
I think the problem is inconsistent handling of the POSIX naming standard on different platforms (and the sloppy handling of it by Python).
On Linux, shm_open('foo', ...)
, shm_open('\foo', ...)
and shm_open('\\foo', ...)
will all be resolved to the same object. (Is it because /dev/shm/foo
== /dev/shm//foo
?)
It is, however, not the case on macOS, the object names should be an exact match. (Similarly, there is no /dev/shm/foo
on macOS).
It seems that the standard SharedMemory
module prepends the name with a slash: link. So when SpineInterface
tries to open /vulp
, it is in fact trying to open //vulp
(which is OK on Linux, but not on macOS).
The simplest solution I can think of is to get rid of all leading slashes in our Python files. What do you think?
I think the case makes itself for adding integration tests. 😅
Should I add try_pid_balancer.sh
to the CI of upkie
?
Also, unit tests :wink: Would it be hard to make one for this particular case?
An integration test sounds good too :+1: We could add an option to try_pid_balancer.sh
(forwarded to the Bullet spine) to stop after a fixed number of iterations, so that the test can run in an asynchronous and deterministic way. Can you propose a PR for this?
Also, unit tests 😉 Would it be hard to make one for this particular case?
I cannot think of a very good case right off the bat. Maybe a Python test which creates a mock spine and tries to attach to it through SpineInterface
could work, what do you think?
An integration test sounds good too 👍 We could add an option to
try_pid_balancer.sh
(forwarded to the Bullet spine) to stop after a fixed number of iterations, so that the test can run in an asynchronous and deterministic way. Can you propose a PR for this?
What you propose is an option and it should be easy to implement. I think a KISS alternative would be to just add a step like timeout N ./try_pid_balancer.sh
(with a build step before, obviously). If the process quits for anything other than a timeout, we'll know that way. Does that sound good?
Hmm, I would be wary of using timeout
in a CI workflow. Things timing-related tend to break in CI runners, where time can be quite elastic. For instance here, the script may time out out before doing anything useful, with the test passing as a false positive.
Maybe a Python test which creates a mock spine and tries to attach to it through SpineInterface could work, what do you think?
That's what the test in spine_interface_test.py
does :see_no_evil: The reason it passed is that it already had the correct syntax for macOS:
Bug description
SpineInterface
cannot access the shared memory file, and my agent fails:Expected behavior
When launching try_pid_balancer.sh, the agent should be able to access the shared-memory file and run normally.
Reproduction steps
Steps to reproduce the behavior:
Additional context
The problem disappears once I checkout an earlier version of the
spine_interface.py
file, all other things remaining the same (e.g.git checkout b179d224da0468a93b6a9a051f6137b70f86498c -- vulp/spine/spine_interface.py
).I am creating this issue here as it seems to be caused by vulp.
I suspect the issue is related to the standard library (or the way we use it), as
posix_ipc
works just fine.System