Closed RAttab closed 8 months ago
@RAttab could you, please, specify what was the HEAD hash with which you hit the issue above?
The PR makes little sense to TBH because the error you reported above was not supposed to happen if you are using the unmodified code.
Look at the when
clause of task which you report as failing: https://github.com/scylladb/scylla-ansible-roles/blob/master/ansible-scylla-node/tasks/common.yml#L136:
- name: Set io.conf
lineinfile:
path: /etc/scylla.d/io.conf
regexp: '^SEASTAR_IO='
line: "{% if scylla_io_probe|bool and scylla_io_probe_dc_aware|bool %}{{ hostvars[dc_to_node_list[dc][0]]['io_conf'] }}{% else %}{{ io_conf }}{% endif %}"
create: yes
become: true
when: io_conf is defined and (_io_conf_file.stat.exists|bool == false or _io_conf_out.stdout|length == 0 or always_replace_io_conf|bool)
Note the first condition: io_conf is defined
so in case io_conf
is undefined this whole task was supposed to be skipped.
However in your report it is executed (and fails).
Could it be that you changed the code before executing?
Or maybe there is something else that makes it to fail - but not the fact that io_conf
is undefined.
I ran into the problem using the af92c08
HEAD with no additional modification.
Note that the error is not raised for the io_conf
variable declared for the current node. The error is raised when accessing the io_conf
variable of another node through hostvars[dc_to_node_list[dc][0]]['io_conf']
which is not checked by the when
clause of that task.
Memory is a bit fuzzy but if I recall correctly the error didn't happen when I first executed the block. It started to happen on re-running the block after failures happened later in the role due to configuration mistakes. I'll see if I can reproduce the issue to give you a full step by step break down but that might take me a few days as I'm a bit time constraint at the moment.
I do know that my change fixes the issue and aligns a bit better with the when
statement of the block. That being said, I also think the proper fix may be to adjust the when
statement to include a check for the io_conf
variable. I only had so much time to dedicate to this particular problem so I did the simplest and most obvious fix that would allow me to keep going.
I ran into the problem using the
af92c08
HEAD with no additional modification.Note that the error is not raised for the
io_conf
variable declared for the current node. The error is raised when accessing theio_conf
variable of another node throughhostvars[dc_to_node_list[dc][0]]['io_conf']
which is not checked by thewhen
clause of that task.Memory is a bit fuzzy but if I recall correctly the error didn't happen when I first executed the block. It started to happen on re-running the block after failures happened later in the role due to configuration mistakes. I'll see if I can reproduce the issue to give you a full step by step break down but that might take me a few days as I'm a bit time constraint at the moment.
I do know that my change fixes the issue and aligns a bit better with the
when
statement of the block. That being said, I also think the proper fix may be to adjust thewhen
statement to include a check for theio_conf
variable. I only had so much time to dedicate to this particular problem so I did the simplest and most obvious fix that would allow me to keep going.
This still doesn't make sense to me unless you used some rather unorthodox parametrization, e.g. defining per-host io_properties
dictionary while omitting entries for some hosts.
The intention of the code piece in question is that values io_properties
, io_conf
, scylla_io_probe
and scylla_io_probe_dc_aware
are either not set to all nodes or set to all nodes to the same value.
If the above holds I don't understand how you can have a defined io_conf
for some host and still have it undefined in the first host in the inventory from the same DC: hostvars[dc_to_node_list[dc][0]]
.
I would need more information about the exact Role parameters that were used when you saw the error above.
And I still don't understand what problem your patch actually resolves: the block in question simply reads the content of corresponding files.
As per the Ansible documentation for blocks:
The directive does not affect the block itself, it is only inherited by the tasks enclosed by a block. For example, a when statement is applied to the tasks within a block, not to the block itself.
For that reason exactly changing the order of execution of independent tasks in the same block is not going to change the final outcome: if the block's condition holds all tasks are going to be executed, if it doesn't - none will be executed.
A reproducer would be nice indeed.
At the moment I don't have the time to spend on digging further into this so if you do not believe this is a problem I will simply close the PR and move on.
Problem
Ran into this problem while trying to deploy a new cluster.
As per the Ansible documentation for blocks:
Since the
io_conf
fact is set after the step to set theio_properties
fact, thewhen
clause for the block that checks whetherio_properties
is set will prevent theio_conf
fact from ever being set. This in turns causes theSet io.conf
task to fail asio_conf
was never set.Solution
Doing the
io.conf
steps before theio.properties
steps in the block solves the issue.