Closed tunahanertekin closed 1 year ago
Apparently, this issue has not been resolved properly.
When deploying a BuildManager w/ it's manifest:
apiVersion: robot.roboscale.io/v1alpha1
kind: BuildManager
metadata:
name: build-cloudy
namespace: my-fleet
spec:
steps:
- command: |
cd $WORKSPACES_PATH/cloudy-ws && \
source /opt/ros/humble/setup.bash && \
apt-get update && \
rosdep update && \
rosdep install --from-path src --ignore-src -y -r
instances:
- robot-cloud-02
name: rosdep-cloudy
workspace: cloudy-ws
- command: |
apt-get update && \
apt-get install -y ros-humble-image-transport-plugins ros-humble-rqt-image-view
instances:
- robot-cloud-02
name: compress-pkgs
workspace: cloudy-ws
- command: |
cd $WORKSPACES_PATH/cloudy-ws && \
source /opt/ros/humble/setup.bash && \
colcon build
instances:
- robot-cloud-02
name: build-cloudy
workspace: cloudy-ws
- command: |
cd $WORKSPACES_PATH/cloudy-ws && \
source /opt/ros/humble/setup.bash && \
colcon build
instances:
- robot-cloud-02
name: build-2-cloudy
workspace: cloudy-ws
- command: |
cd $WORKSPACES_PATH/physical-ws && \
source /opt/ros/humble/setup.bash && \
apt-get update && rosdep update && \
rosdep install --from-path src --ignore-src -y -r
instances:
- cloudy-mini-agv
name: rosdep-physical
workspace: physical-ws
- command: |
apt-get update && \
apt-get install -y ros-humble-image-transport-plugins ros-humble-realsense2-camera
instances:
- cloudy-mini-agv
name: camera-pkgs
workspace: physical-ws
- command: |
cd $WORKSPACES_PATH/physical-ws && \
source /opt/ros/humble/setup.bash && \
colcon build
instances:
- cloudy-mini-agv
name: build-physical
workspace: physical-ws
- command: |
cd $WORKSPACES_PATH/physical-ws && \
rosdep update && \
source install/setup.bash && \
source install/local_setup.bash && \
ros2 run micro_ros_setup create_agent_ws.sh && \
ros2 run micro_ros_setup build_agent.sh
instances:
- cloudy-mini-agv
name: micro-ros-physical
workspace: physical-ws
status:
active: true
phase: BuildingRobot
scriptConfigMapStatus:
created: true
reference:
apiVersion: v1
kind: ConfigMap
name: build-cloudy-scripts
namespace: my-fleet
resourceVersion: "6359"
uid: 748cca81-cc36-4c8d-9d81-94aeaf28316a
steps:
- resource:
created: true
phase: Active
reference:
apiVersion: batch/v1
kind: Job
name: build-cloudy-rosdep-physical
namespace: my-fleet
resourceVersion: "6455"
uid: 07a478e7-45bd-4d54-98f9-a11183c5878e
step:
command: |
cd $WORKSPACES_PATH/physical-ws && \
source /opt/ros/humble/setup.bash && \
apt-get update && rosdep update && \
rosdep install --from-path src --ignore-src -y -r
instances:
- cloudy-mini-agv
name: rosdep-physical
workspace: physical-ws
- resource:
created: false
reference: {}
step:
command: |
apt-get update && \
apt-get install -y ros-humble-image-transport-plugins ros-humble-realsense2-camera
instances:
- cloudy-mini-agv
name: camera-pkgs
workspace: physical-ws
- resource:
created: false
reference: {}
step:
command: |
cd $WORKSPACES_PATH/physical-ws && \
source /opt/ros/humble/setup.bash && \
colcon build
instances:
- cloudy-mini-agv
name: build-physical
workspace: physical-ws
- resource:
created: false
reference: {}
step:
command: |
cd $WORKSPACES_PATH/physical-ws && \
rosdep update && \
source install/setup.bash && \
source install/local_setup.bash && \
ros2 run micro_ros_setup create_agent_ws.sh && \
ros2 run micro_ros_setup build_agent.sh
instances:
- cloudy-mini-agv
name: micro-ros-physical
workspace: physical-ws
BuildManager creates these jobs and pods:
$ kubectl get jobs -n my-fleet
NAME COMPLETIONS DURATION AGE
build-cloudy-rosdep-cloudy 0/1 4m37s 4m37s
build-cloudy-rosdep-physical 0/1 4m37s 4m37s
$ kubectl get pods -n my-fleet
NAME READY STATUS RESTARTS AGE
build-cloudy-rosdep-physical-pgqpr 0/1 Error 0 5m30s
build-cloudy-rosdep-physical-vlp89 0/1 Error 0 4m32s
build-cloudy-rosdep-cloudy-9htpl 0/1 Error 0 5m30s
build-cloudy-rosdep-cloudy-75khk 1/1 Running 0 25s
Problem is that the BuildManager in the first manifest shouldn't create a job named build-cloudy-rosdep-cloudy
since the name of the instance is cloudy-mini-agv
. It can be observed that BuildManager's status is generated successfully, not containing the wrong step. Root cause of this error is being investigated.
~Root cause can be found in this function:~
func ContainsInstance(instances []string, instance string) bool {
if len(instances) == 0 {
return true
}
for _, v := range instances {
if v == instance {
return true
}
}
return false
}
~By default, a newly initialized Step
object has it's instances empty (so it always returns true
). Irrelevant steps can be executed if they are processed before their instances are set.~
In deletion attempt of builder jobs, operator updates the instance status and puts first step to status even if it's configured to run in current instance. Here's the file that carries root cause and the solution: (solved in f136f9ab0de5afe4453e89b639cb7314dca836de)
func reconcileDeleteBuilderJobs
What happened?
Every build step has
.selector
field that determines if the step will be executed on current cluster. BuildManager cannot satisfy this functionality. For example, it sometimes creates a step's job in physical instance even if the step's selector indicates only cloud instance.What did you expect to happen?
I expect to see that steps are created in desired instances.
How can we reproduce it (as minimally and precisely as possible)?
It can be reproduced by applying a BuildManager like this:
Kubernetes version
All Kubernetes versions
Container network interface (CNI) and version
No response