Add CI for rolling-version on humble/iron

christophfroehlich commented 5 months ago

Accompanying

Any better ideas for the names of the workflows?

I don't think we want to fight the releases of ignition/gz on humble. Therefore, I excluded ign/gz_ros2_control

saikishor commented 5 months ago

I'm not sure calling them unstable is the right word. How about rolling compatibility on iron/humble?. If needed, We can also add unsupported keyword to the naming.

What do you think?

christophfroehlich commented 5 months ago

I was looking for a term not used in the ROS ecosystem (like testing), but unstable is something typical in the debian world. But I'm open for other namings. But for sure it will be unstable :D

christophfroehlich commented 5 months ago

@saikishor any idea why building controller_manager fails here? https://github.com/ros-controls/ros2_control_ci/actions/runs/8133285062/job/22224687946?pr=30

saikishor commented 5 months ago

This happened to me earlier in my machine, I believe the reason is it has an installed version of ros2_control_test_assets that is causing the issue. I solved it in my local machine by uninstalling the installed version

christophfroehlich commented 5 months ago

Ah, you are right. It is installed with gazebo_ros2_control (see question in initial post)

  executing command [apt-get install -y -qq ros-humble-gazebo-ros2-control]
  Setting up ros-humble-ros2-control-test-assets (2.39.1-1jammy.20240217.030243) ...
  Setting up ros-humble-control-msgs (4.4.0-1jammy.20240217.061208) ...
  Setting up ros-humble-realtime-tools (2.5.0-1jammy.20240217.075900) ...
  Setting up ros-humble-controller-manager-msgs (2.39.1-1jammy.20240217.051450) ...
  Setting up ros-humble-hardware-interface (2.39.1-1jammy.20240217.081024) ...
  Setting up ros-humble-controller-interface (2.39.1-1jammy.20240217.081539) ...
  Setting up ros-humble-controller-manager (2.39.1-1jammy.20240217.082030) ...
  Setting up ros-humble-gazebo-ros2-control (0.4.6-1jammy.20240217.083418) ...

But do we have an issue with overriding the binary version?

christophfroehlich commented 5 months ago

And at least one test fails on iron :(

   [ RUN      ] TestLoadController.spawner_test_type_in_param
   ...
     terminate called after throwing an instance of 'std::runtime_error'
      what():  Can not get command interface configuration until the controller is configured.

saikishor commented 5 months ago

Ah, you are right. It is installed with gazebo_ros2_control (see question in initial post)

Ideally, if we use the upstream ros2_control, ideally it should work directly. We need to test it. In worst case, we can skip this as you suggested as we don't have to add compatibility with it

But do we have an issue with overriding the binary version?

I believe yes, we might have an issue with the overriding binary versions. I think it is better to do it without any installed versions.

saikishor commented 5 months ago

And at least one test fails on iron :(

   [ RUN      ] TestLoadController.spawner_test_type_in_param
   ...
     terminate called after throwing an instance of 'std::runtime_error'
      what():  Can not get command interface configuration until the controller is configured.

That's weird, it needs to work. I can take a look later

christophfroehlich commented 5 months ago

@christophfroehlich Can't we use reusable builds in this case or Iron and Humble?

sorry, what do you mean? Creating workflows in all our repos to build it there? sure, but we didn't want it to run on PR or pushs, so why not to make it centralized here?

saikishor commented 5 months ago

sorry, what do you mean? Creating workflows in all our repos to build it there? sure, but we didn't want it to run on PR or pushs, so why not to make it centralized here?

Sure!

BTW I've tested building the gazebo_ros2_control upstream version in the workspace, it seems to work directly without any modifications.

christophfroehlich commented 5 months ago

OK. but now also the humble workflow fails because of the binary installation of the test assets. I'll have a look these days, how I could fix this

saikishor commented 5 months ago

Sure. Thank you

christophfroehlich commented 5 months ago

@saikishor humble tests are failing because of this incompatibility https://github.com/ros-controls/ros2_control/pull/913#issuecomment-1404210606 can one define conditionals for python imports as well?

saikishor commented 5 months ago

@saikishor humble tests are failing because of this incompatibility https://github.com/ros-controls/ros2_control/pull/913#issuecomment-1404210606 can one define conditionals for python imports as well?

I'm sorry about it. I didn't run the tests locally. I didn't think about the python part. You are right. If I find some time, I'll get onto it

christophfroehlich commented 5 months ago

The humble jobs fail because of gz_ros2_control, but this is not included anywhere. I think this is imported from the cache because the normal stack build and this job share the cache? not sure about the github.run_id, but the rest is the same: target_ws-humble--testing-reusable_industrial_ci_with_cache--8256893443 https://github.com/ros-controls/ros2_control_ci/blob/7c655929b80445962013564f954b20b2bebdfc3c/.github/workflows/reusable-industrial-ci-with-cache.yml#L67 https://github.com/ros-controls/ros2_control_ci/blob/7c655929b80445962013564f954b20b2bebdfc3c/.github/workflows/reusable-industrial-ci-with-cache.yml#L82

I don't want to test gz/ign_ros2_control on humble here. (see the errors we have with the normal stack build). @fmauch any suggestion how to fix this?

saikishor commented 5 months ago

It is weird that it is caching gz_ros2_control package and running tests, strangely it is hard to reproduce locally on a clean humble docker. It would be nice to figure it out and fix it before merging this

fmauch commented 5 months ago

The humble jobs fail because of gz_ros2_control, but this is not included anywhere. I think this is imported from the cache because the normal stack build and this job share the cache? not sure about the github.run_id, but the rest is the same: target_ws-humble--testing-reusable_industrial_ci_with_cache--8256893443

https://github.com/ros-controls/ros2_control_ci/blob/7c655929b80445962013564f954b20b2bebdfc3c/.github/workflows/reusable-industrial-ci-with-cache.yml#L67

https://github.com/ros-controls/ros2_control_ci/blob/7c655929b80445962013564f954b20b2bebdfc3c/.github/workflows/reusable-industrial-ci-with-cache.yml#L82

I don't want to test gz/ign_ros2_control on humble here. (see the errors we have with the normal stack build). @fmauch any suggestion how to fix this?

Yes, that's definitively a problem. The target-ws is restored by the key

target_ws-${{ env.CACHE_PREFIX }}-${{ hashFiles('**/CMakeLists.txt', '**/package.xml') }}

with ${{ env.CACHE_PREFIX }} being

${{ inputs.ros_distro }}-${{ inputs.upstream_workspace }}-${{ inputs.ros_repo }}-${{ github.job }}

resulting in

target_ws-humble--main-reusable_industrial_ci_with_cache--8261042993

Given the -- the github.job seems to be empty, so as soon as another cache was built without touching a CMakeLists.txt or package.xml that will be reused.

I don't quite understand, why it is empty, though, I would expect it to be stack-build-on-humble since that's also the name shown on the Github UI.

Edit: Re-reading the documentation makes that clear:

The job_id of the current job. Note: This context property is set by the Actions runner, and is only available within the execution steps of a job. Otherwise, the value of this property will be null.

Since we're not inside a step when defining the prefix it is empty.

Edit2: Thinking about this that might be a larger problem. That means that an action run from a PR might re-use the cache from the master branch as long (as long as neither a CMakeLists.txt or package.xml are touched). I think, github.ref should get in there, too.

christophfroehlich commented 5 months ago

No, I think the github.job is reusable_industrial_ci_with_cache: It takes the job from the reusable workflow, not that from the calling workflow.

christophfroehlich commented 5 months ago

Edit2: Thinking about this that might be a larger problem. That means that an action run from a PR might re-use the cache from the master branch as long (as long as neither a CMakeLists.txt or package.xml are touched). I think, github.ref should get in there, too.

but using github.ref would mean that PRs don't use the cache from other branches. And if we have a restore-key without github.ref, then we would have the same issue as in this PR.

fmauch commented 5 months ago

No, I think the github.job is reusable_industrial_ci_with_cache: It takes the job from the reusable workflow, not that from the calling workflow.

Not sure, but either way if the job id isn't available... github.workflow could be an alternative.

christophfroehlich commented 5 months ago

Maybe we could use a composable action instead of a reusable workflow? Then the job-id would be unique as we would like it.

fmauch commented 5 months ago

but using github.ref would mean that PRs don't use the cache from other branches.

True, but do we want that? I guess, we want that for PR runs, right? We could add github.base_ref for those.

And if we have a restore-key without github.ref, then we would have the same issue as in this PR.

The restore key obviously has to go hand-in hand.

fmauch commented 5 months ago

Maybe we could use a composable action instead of a reusable workflow? Then the job-id would be unique as we would like it.

That does indeed not sound too bad...

christophfroehlich commented 5 months ago

but using github.ref would mean that PRs don't use the cache from other branches.

True, but do we want that? I guess, we want that for PR runs, right? We could add github.base_ref for those.

PRs targeting master should use the cache from the master branch, but not the other way round. Gitlab does not use caches from PRs, how does that work with github?

base_ref is what we want for PRs, but how would that work with scheduled builds or push events?

The base_ref or target branch of the pull request in a workflow run. This property is only available when the event that triggers a workflow run is either pull_request or pull_request_target.

Maybe we could use a composable action instead of a reusable workflow? Then the job-id would be unique as we would like it.

That does indeed not sound too bad...

if the jobs are unique for different branches, then this would solve the case from above?

fmauch commented 5 months ago

if the jobs are unique for different branches, then this would solve the case from above?

Not necessarily. Caches from PRs and scheduled builds would still get mixed up. That might be fine in most cases, but might as well lead to strange errors as the one mentioned above.

I guess, the cleanest would be:

Use a proper job identifier (e.g. github.workflow)
Use three possbile restore keys:
- ...${{ github.base_ref }}-${{ github.ref }}
- ...${{ github.base_ref }}
- ...${{ github.ref }}

This way on a PR it would search for a cache e.g. ...master-my_feature and if it doesn't find that during the first build use ...master, while for a scheduled build it would use ...master, since ...master- and ... don't exist.

That could work, right?

Changing things to actual actions rather than reusable workflow could be a separate discussion, I think.

christophfroehlich commented 5 months ago

if the jobs are unique for different branches, then this would solve the case from above?

Not necessarily. Caches from PRs and scheduled builds would still get mixed up. That might be fine in most cases, but might as well lead to strange errors as the one mentioned above.

I guess, the cleanest would be:
* Use a proper job identifier (e.g. `github.workflow`)

* Use three possbile restore keys:

  * `...${{ github.base_ref }}-${{ github.ref }}`
  * `...${{ github.base_ref }}`
  * `...${{ github.ref }}`
This way on a PR it would search for a cache e.g. ...master-my_feature and if it doesn't find that during the first build use ...master, while for a scheduled build it would use ...master, since ...master- and ... don't exist.

That could work, right?

not sure, what will be github.workflow from within the reusable workflow?

The name of the workflow. If the workflow file doesn't specify a name, the value of this property is the full path of the workflow file in the repository. from https://docs.github.com/en/actions/learn-github-actions/contexts

Changing things to actual actions rather than reusable workflow could be a separate discussion, I think.

Maybe it just solves the problem above, because the github.job then names the calling job.

fmauch commented 5 months ago

not sure, what will be github.workflow from within the reusable workflow?

Indeed that's an interesting question, but it should at lease be available in contrast to the job's name.

Maybe it just solves the problem above, because the github.job then names the calling job.

It will still not be available outside of the steps, right? Or will it because it is part of the step of the calling job?

I guess it is time to prototype something up to verify, I probably won't find time for that before the weekend.

christophfroehlich commented 5 months ago

I added gz_ros2_control now for the humble build, and it succeeds now as well. I'd merge this PR if ok for you @fmauch @saikishor, or do you have other naming suggestions? I'll open an issue for the cache topic.

saikishor commented 5 months ago

Nice great job guys. CI is passing now!

ros-controls / ros2_control_ci

Add CI for rolling-version on humble/iron #30