Closed bl33pbl0p closed 5 years ago
This is related, and a similar problem.
https://lists.freedesktop.org/archives/systemd-devel/2019-January/041942.html
Looking at the code, I can only think of three solutions so far, further discussion would be appreciated:
This is only for services though, I am not sure how this mixes with other unit types...
Wait for any conflicting jobs of dependencies to be processed before scheduling timer: Easy fix, but is racy (what if some conflicting job comes in right before we schedule the timer and conflicts again).
Introduce a concept of job queue per unit Probably the most involved but also the correct solution. Allow, through a job mode, to queue jobs on a unit. One way is to make it possible to install multiple jobs (maybe have a limited number of slots), and then when processing a job for a unit, do it in order they were installed. It is unclear however if all jobs installed in a unit should be done atomically, and how ordering has effect. I guess it would be to check if u->job_after is non-NULL after one processes u->job and then not return JOB_DONE but instead go to dispatch it iff the previous job had the JOB_DONE result (assuming we allow for two slots). This however needs more reworking as we also need to respect ordering of any other jobs in queue that might just be waiting on stop (and need to happen before start).
Now, two slots has some interesting effect in my testing. Any further requests with, say, the 'append' job mode, may replace what was wrongly installed after the running job for unit previously.
Another way to fake this queuing effect is to mark the job as merged but re-run it, like JOB_RELOAD, but this probably involves exploring the combinatorial matrix of all possible conflicting combinations and how they should be dealt with, and again has interesting effects with ordering of other jobs of other units.
The udev problem is not solved. https://lists.freedesktop.org/archives/systemd-devel/2019-January/041942.html I will create another ticket.
@Shuangistan Yes, that problem is related to device state machine using JOB_FAIL, it is similar in nature to this one, hence feel free to open a new ticket to track it.
systemd version the issue has been seen with
Used distribution
Problem Three units have requirement dependencies as follows.
When any of these are running and the main pid dies due to failure (say I kill -9 it for third_to_start), the state changes to failed (on which the BindsTo= dep from the other two causes them to have a stob job enqueued). Meanwhile, before that stop job gets to complete, the 100ms RestartSec= for third_to_start.service is hit and a restart job with the 'fail' job mode is triggered. This transaction will try to start the unit again, pulling in start jobs for the dependents that already have a stop job queued (one waiting and one running as they happen in order). This causes the transaction to fail entirely deemed as destructive (since unmergeable due to the job mode), and all three units will then stop (instead of possibly waking up again).
This is racy, in fact, if RestartSec= is bumped to some higher value, the restart job completes successfully (as there are no more stob jobs at that point of time).
I think it would be appropriate to schedule the timer only after the stop jobs for its dependents are done, because bumping RestartSec= for cases where ExecStop= in either of the units takes an underterministic amount of time is not a practical solution, so is not bailing out on this. Another would be allowing setting the job mode for this transaction, so perhaps something like replace will make it work again (on the cost of abruptly cancelling stop jobs).
Logs of what happens: