roboticslab-uc3m / questions-and-answers

A place for general debate and question&answer
https://robots.uc3m.es/developer-manual/appendix/repository-index.html
2 stars 0 forks source link

Migrate to GitHub Actions #91

Closed PeterBowman closed 3 years ago

PeterBowman commented 3 years ago

Travis CI is now making $$$ on "free" builds according to consumed job minutes (ref). Active users/orgs will hit the free plan limit sooner or later. It had happened to YARP folks and they decided to switch to GH Actions at https://github.com/robotology/yarp/issues/2410. Now, we have reached this limit five days ago, too, so no more Travis builds can start at the rl-uc3m org.

I'm afraid it's time to say goodbye to Travis CI.

PeterBowman commented 3 years ago

@jgvictores has found https://github.com/cyberbotics/webots-animation-action. GH Actions seems such a powerful tool!

PeterBowman commented 3 years ago

Useful links:

PeterBowman commented 3 years ago

Enabled in color-debug: https://github.com/roboticslab-uc3m/color-debug/commit/76a82da9b5f900b1cd9e146fc863e96d5ce4230d.

For further reference:

PeterBowman commented 3 years ago

Default CMake version in Ubuntu bionic and focal images is 3.19. YARP/YCM require at least 3.12. To stay on the safe side, I'm going to force builds against CMake 3.12 with https://github.com/jwlawson/actions-setup-cmake (we usually work with newer versions nowadays, which is prone to introduce non-back-compat behavior). Note conventions have changed and modern CMake projects can be configured, built and installed (docs) with:

cmake -S path_to_source -B path_to_build # creates directory structure if necessary
cmake --build path_to_build
cmake --install path_to_build

Pre-CMake 3.13 projects need to change the working directory and have the build dir created before first config (docs):

cmake -E make_directory path_to_build
# you can use `cd` here or prepend `cmake -E chdir` to each command as shown below:
cmake -E chdir path_to_build cmake ..
cmake -E chdir path_to_build cmake --build .
cmake -E chdir path_to_build cmake --build . --target install

See https://github.com/roboticslab-uc3m/kinematics-dynamics/commit/780d25a92db5a1dc58b541bc3d470142f7874a8b.

PeterBowman commented 3 years ago

Step commands (jobs.<job_id>.steps[*].run) are executed in the default OS-shell (Windows: PowerShell, non-Windows: Bash) unless specified otherwise via jobs.<job_id>.steps[*].shell. The bash option is also available on Windows systems using the bash shell included with Git for Windows. If all steps in one or all jobs are meant to be executed on both Windows and non-Windows systems, adding these lines instructs the runner to always prefer bash so that there is no need to add a shell property per step:

defaults:
  run:
    shell: bash

See example at color-debug. This default value may target all steps in one job (docs) or all steps in all jobs (docs).

I'm not sure how portable Bash really is, but it seems to work for our simple use-case in color-debug (Linux+Windows). An alternative, multi-platform strategy involves using CMake as a shell command launcher via cmake -P {0} (ref). This solution has been adopted in YARP (ref).

PeterBowman commented 3 years ago

Commit https://github.com/roboticslab-uc3m/kinematics-dynamics/commit/4cf5926b4a5172946cf9ff945779f63efba93cf5 introduces caching of main dependencies (YARP and KDL) with ccache (check also this article). Comparing build times:

I am using https://github.com/hendrikmuhs/ccache-action, which internally implements this methodology using the official cache action (guide).

The following CMake options enable ccache for selected builds:

cmake .. -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache

Note: this is a CMake variable, CMAKE_<LANG>_COMPILER_LAUNCHER. In CMake 3.17, an environment variable of the same name was introduced (ref).

Speaking of kinematics-dynamics, I am enabling ccache only for YARP and KDL. Also, these builds are benefiting from parallelization via cmake --build <dir> --parallel (uses default parallel level if no value is provided). Intentionally avoiding parallel builds of YCM due to numerous failures experienced in past Travis jobs. I am not boosting the main build (i.e. kinematics-dynamics) on purpose, might change my mind later on.

PS: added -DYARP_DISABLE_VERSION_SOURCE=ON to YARP build config in order to "avoid rebuilding everything every commit" (ref).

PeterBowman commented 3 years ago

It turns out scheduling jobs the way we did with Travis is a but more clumsy on GHA. It is not possible to evaluate conditions when defining an inclusion in the job matrix (if event_type is schedule then add job). All combinations for scheduled and non-scheduled event types must be defined at once, then we'd just exclude YARP master branch combinations for non-scheduled jobs. However, jobs.<job_id>.if cannot be used here since the matrix is defined after evaluating the condition (ref, https://github.com/roboticslab-uc3m/kinematics-dynamics/commit/ebddfcc131056568a7ae7b8656f92564b2c85ee7).

A dummy matrix key that depends on the event type can be used to determine exclusions, though, as explained here: https://github.com/roboticslab-uc3m/kinematics-dynamics/commit/038a706bd4a03895da68cd9691850dca8b88afed. Note it is not possible to apply a similar trick for conditional job inclusions: https://github.com/roboticslab-uc3m/kinematics-dynamics/commit/fa3d1f579824472f328e3f6bf62746f4bc0f6143.

Alternatives involve using a dynamic matrix which is conditionally defined in one job and actually consumed in another, dependent job: ref. IMO even more clumsy. Of course I could just avoid all this mess and always run jobs on YARP's master branch, i.e. on every push, and possibly set jobs.<job_id>.continue-on-error to true so that the workflow run passes regardless of errors (which I hope would be notified somehow).

PS: according to docs, "scheduled workflows run on the latest commit on the default or base branch", so I can't fully test this unless my feature branch is merged.

PeterBowman commented 3 years ago

The amor-api private repo now participates in the CI workflow at https://github.com/roboticslab-uc3m/kinematics-dynamics/commit/201fd98fd4a1132633aecbea93596cd504e73f38 (WIP). In order to grant access, I am providing my personal access token (PAT) via the org-wide secret AMOR_API, currently enabled only on repos that need to fetch amor-api sources (kin-dyn, yarp-devices and amor-main). More info:

PeterBowman commented 3 years ago

Similarly to Travis CI, GitHub Actions workflow runs can be skipped if the commit description includes a [skip ci] string.

Quoting https://github.blog/changelog/2021-02-08-github-actions-skip-pull-request-and-push-workflows-with-skip-ci/:

GitHub Actions now supports skipping push and pull_request workflows by looking for some common keywords in your commit message.

If any commit message in your push or the HEAD commit of your PR contains the strings [skip ci], [ci skip], [no ci], [skip actions], or [actions skip] workflows triggered on the push or pull_request events will be skipped.

PeterBowman commented 3 years ago

Speaking of kinematics-dynamics, I am enabling ccache only for YARP and KDL. Also, these builds are benefiting from parallelization via cmake --build <dir> --parallel (uses default parallel level if no value is provided). (...) I am not boosting the main build (i.e. kinematics-dynamics) on purpose, might change my mind later on.

Since ccache seems reliable enough, I have enabled caching for the main build, too. Not using --parallel for better browsing of the compile log.

PeterBowman commented 3 years ago

Alternatives involve using a dynamic matrix which is conditionally defined in one job and actually consumed in another, dependent job: ref. IMO even more clumsy. Of course I could just avoid all this mess and always run jobs on YARP's master branch, i.e. on every push, and possibly set jobs.<job_id>.continue-on-error to true so that the workflow run passes regardless of errors (which I hope would be notified somehow).

I am not the only one who seeks this feature: community post, https://github.com/actions/runner/issues/2347. The continue-on-error seems to prevent a specific job from canceling other jobs, but still marks the whole run as failed. I was expecting some sort of notification about allowed failures as other users report in the linked issue (this was a Travis feature). In contrast, fail-fast defines the default behavior for all jobs, which is basically the same thing as continue-on-error with a different scope. See also https://github.com/actions/runner/issues/2347.

Why would we want this? For early testing of experimental upstream dependencies (i.e. YARP/YCM). Options:

A dummy matrix key that depends on the event type can be used to determine exclusions, though, as explained here: roboticslab-uc3m/kinematics-dynamics@038a706.

This workaround doesn't cope well with JSON-based matrix entries: https://github.com/roboticslab-uc3m/kinematics-dynamics/commit/5b2e3d8c9daa3b436aebf9c41de90a9f27b33ffe. The run got stuck and I had to cancel it. This is a rather lame reason in favor of switching to the second option (always 12 jobs), but still...

As spoken with @jgvictores, we'll stick to old behavior, i.e. option 1, i.e. only build against unstable branches on schedule. The issue was fixed at https://github.com/roboticslab-uc3m/kinematics-dynamics/commit/98179d6a7ca03c242e4f95b02368d5e88fdba7e5, turned out to be valid YAML syntax, yet contrary to the actual intent.

PeterBowman commented 3 years ago

Commit https://github.com/roboticslab-uc3m/kinematics-dynamics/commit/59c6b9a5c2d9e5cbfdb9d34c147f98de8fe0cd5c enables path exclusion on push events in such a way that a new run is not triggered whenever the algorithm detects that no file outside the doc/ directory has been changed, also ignoring all .md files.

Docs:

PeterBowman commented 3 years ago

Mostly ready at kinematics-dynamics: ci.yml. Cron jobs have been configured in a weekly fashion, on Sunday, 02:00 UTC. Also, I have prepared a template for quick reuse in other repos: roboticslab-uc3m/.github, docs.

PeterBowman commented 3 years ago

TO-DO list (from https://travis-ci.com/github/roboticslab-uc3m):

Also:

PeterBowman commented 3 years ago

We could also migrate the generation of Doxygen documentation and GitBook manuals (currently at robots.uc3m.es) to GH Actions. For instance, this script was used in robotology/gazebo-yarp-plugins to generate http://robotology.github.io/gazebo-yarp-plugins/master/.

Links:

Additional tools that might come in handy:

PeterBowman commented 3 years ago

Three repositories are currently configured to run weekly CI builds: tools, speech, kin-dyn.

Remarks:

PeterBowman commented 3 years ago

This is nice: the .md filter prevented https://github.com/roboticslab-uc3m/kinematics-dynamics/commit/a9acac8b8246c77be885f83648b12eddcac43d3c from triggering a CI run, but the weekly cron workflow started nonetheless (run).

I see that cache eviction is not happening on weekly crons. I think it's safe to keep on migrating to GHA on the remaining repos. However, I'm probably dilating the no-build period on crons for selected, low-traffic repositories listed at https://github.com/roboticslab-uc3m/questions-and-answers/issues/91#issuecomment-784199905.

PeterBowman commented 3 years ago

I have integrated https://github.com/fkirc/skip-duplicate-actions in our CI workflow: https://github.com/roboticslab-uc3m/kinematics-dynamics/commit/55cfdcc78f53b71da69f746cfa84b15e6483a2ea. It covers the following use cases:

I believe these features will be eventually integrated into vanilla GHA. On the other hand, I don't quite understand "Skip ignored paths". I think GHA already provides backtracking, or at least the following scenario has been successfully tested:

I thought GHA would skip the workflow run for the last action because of "stupidly looking at the current commit", as stated in the README for skip-duplicate-actions. It didn't do so, though. I think it's performing a two-way comparison on each push: ref.

PeterBowman commented 3 years ago

We could also migrate the generation of Doxygen documentation (...)

Done in kin-dyn at https://github.com/roboticslab-uc3m/kinematics-dynamics/commit/aed50de49991aceae784fb09f9ef3bd043623742 using https://github.com/crazy-max/ghaction-github-pages (doxygen.yml, https://roboticslab-uc3m.github.io/kinematics-dynamics). I was considering two ways to do this:

I've chosen the latter and triggering on every push as long as GH detects changes on selected directories. I am also enabling manual dispatch, i.e. new runs can be manually triggered via GUI (see "Run workflow" button here).

Notes:

Occurrences of update-dox.sh across our org, currently active per update-daily.sh:

Ignoring these per lack of "doxygen" link in repository-index.md: follow-me, gaitcontrol (update script outside project), force-torque-balance, xgnitive.

See search results for remaining occurrences.

PeterBowman commented 3 years ago

We could also migrate the generation of (...) GitBook manuals (...)

Done at https://github.com/roboticslab-uc3m/teo-developer-manual/commit/c9a99f03dcace6681d8c0d17fff6b33ab54b2dfc (gitbook.yml). Initially, I resorted to installing Calibre and GitBook from scratch (node modules + plugins), which took +3 minutes to complete: gitbook.yml. To avoid this, I tried to take advantage of custom Docker containers, thus cutting run times down to one minute (but no PDF). Links:

Sadly, PDF generation is broken, so we've decided to take it out (along with MOBI and EPUB files). After installing hundreds of MBs of dependencies (Calibre/ebook-convert + svgexport/puppeteer/Chromium) and solving a known --no-sandbox issue (see intructions, permalink), it gets stuck in the gitbook pdf step (but no errors are thrown). This is the closest I have gotten, aditionally it is necessary to create a new user and assign permissions (see linked puppeteer's troubleshooting guide):

FROM alpine:latest AS download_calibre
ENV CALIBRE_DOWNLOAD_URL https://download.calibre-ebook.com/5.13.0/calibre-5.13.0-x86_64.txz
RUN mkdir /tmp/calibre && \
    wget --no-check-certificate -qO- $CALIBRE_DOWNLOAD_URL | tar xvJ -C /tmp/calibre

FROM node:10.24-buster-slim
ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/opt/calibre/lib
ENV PATH $PATH:/opt/calibre/bin
COPY --from=download_calibre /tmp/calibre /opt/calibre
RUN npm install -g --unsafe-perm \
        gitbook-cli \
        gitbook-summary \
        svgexport && \
    gitbook install && \
    rm -rf /tmp/* && \
    npm cache clean --force
EXPOSE 4000
VOLUME /gitbook
WORKDIR /gitbook
CMD ["gitbook", "--help"]

TODO per update-daily.sh:

PS the old robots.uc3m.es/gitbook-xxx/ addresses can be easily redirected (HTTP 301) with something like this in /var/robots/htdocs/.htaccess (tutorial):

RedirectMatch 301 "^/gitbook-teo-developer-manual/(.*)" "https://roboticslab-uc3m.github.io/teo-developer-manual/$1"
jgvictores commented 3 years ago

Note: https://github.com/asrob-uc3m/repostatistics

jgvictores commented 3 years ago

@PeterBowman THANK YOU SO MUCH!!!! 🎉🎉🎉🎉🎉

PeterBowman commented 3 years ago

I have just learned that "scheduled workflows are automatically disabled when no repository activity has occurred in 60 days" (ref). The following notice is displayed right now in the Action's tab of the speech repo:

Screenshot_2021-06-01 roboticslab-uc3m speech

If we didn't care about the future of the planet, then we could schedule a cron on our robots.uc3m.es server to run gh workflow enable ci every now and then, see docs.

PeterBowman commented 3 years ago

Note sometimes builds may fail for no apparent reason. We have recently seen pretty weird segfaults on unit tests affecting just Clang builds on YARP master and Ubuntu bionic, among other issues. It is fine, probably ccache artifacts are conflicting on linkage or something like that. Don't panic and invalidate the cache like this: https://github.com/roboticslab-uc3m/tools/commit/38b02d2b807e91a4009b460ac6973289da6cc004. Make sure to also target master builds, then manually trigger the workflow again to make sure it's fixed and rebuild the cache. Currently there is no other way to clear the cache (GH Support).