lcolladotor / biocthis

Automate package and project setup for Bioconductor packages
https://lcolladotor.github.io/biocthis/
49 stars 16 forks source link

[FR] Improve `check-bioc.yml` action #8

Closed nuno-agostinho closed 3 years ago

nuno-agostinho commented 4 years ago

Hello!

I really like the work you put into creating check-bioc.yml and I used it as the basis to create my own GitHub Actions. Thank you for this amazing resource! However, I think the file is a bit complex to be understood and modified by novice users.

Here I present you some suggestions to improve on it. You can see an example of a GitHub Action of mine in production with the suggestions that follow.

Please let me know what do you think! Again, thank you for your amazing work :)

Branches to run on

First of all, I think all branches should be run on GitHubActions (with possible exceptions like github-pages). As an R/Bioconductor package developer, I have development branches to which I always want to run in GitHub Actions. So, instead of:

https://github.com/lcolladotor/biocthis/blob/3b96da142aef74df69f35664a13d1283359d5db5/actions/check-bioc.yml#L32-L40

We could have:

on:
  push:
    branches-ignore:
      - gh-pages
  pull_request:
    branches-ignore:
      - gh-pages

Windows, Mac and Linux builds

I would argue that check-bioc.yml's complexity narrows down to code redundancy largely due to testing the Linux build in a Docker image separately from Windows and Mac. Also, this also means that we have to wait for the Linux version to finish testing before running the build in Windows and Mac.

As such, maybe it would be best to remove the Docker part (an alternative would be to test Windows/Mac in Docker images, but GitHub Actions does not support this as far as I know).

Build in both bioc-release and bioc-devel

I always like to know if there are issues with the code I am writing in both the release and development version of Bioconductor. This way, we know if something is going to break in a specific version of Bioconductor beforehand. My matrix configuration is like so:

matrix:
        config:
          - {os: windows-latest, biocversion: "release"}
          - {os: windows-latest, biocversion: "devel"}
          - {os: macOS-latest,   biocversion: "release"}
          - {os: macOS-latest,   biocversion: "devel"}
          - {os: ubuntu-latest,  biocversion: "release"}
          - {os: ubuntu-latest,  biocversion: "devel"}

To find the R version related with each version of Bioconductor, we can use bash:

      - name: Find R version to run
        run: |
          config="https://bioconductor.org/config.yaml"
          rversion=$(curl ${config} | \
            grep r_version_associated_with_${BIOCVERSION} | \
            grep -o "[0-9]*\.[0-9]*\.[0-9]*")
          echo "Using R ${rversion}..."
          echo "::set-output name=rversion::${rversion}"
        shell:
          bash {0}

Ubuntu versions

When defining the Ubuntu version to test on, why is RSPM also defined?

https://github.com/lcolladotor/biocthis/blob/3b96da142aef74df69f35664a13d1283359d5db5/actions/check-bioc.yml#L314

Although this is fine for specific versions of Ubuntu, a novice may simply update the version without understanding what the RSPM is and get confused with potential failures afterwards. I would suggest to either remove the RSPM variable or to add other Ubuntu versions with their respective RSPM values.

Also, I would suggest to put here - {os: ubuntu-latest}.

nuno-agostinho commented 4 years ago

It may also be relevant to regularly run these checks (daily? every three days? weekly?)

To do so, we can add a cron scheduler:

on: 
  push: 
    branches: 
      - master 
      - 'RELEASE_*' 
  pull_request: 
    branches: 
      - master 
      - 'RELEASE_*' 
  schedule: 
    - cron: "0 7 * * *" # Run every day at 07:00 UTC

Some other alternatives (using crontab.guru makes it easier to create these):

# Every Monday at 07:00 UTC
cron: "0 7 * * 1"

# Every Monday, Wednesday and Friday at 07:00 UTC
cron: "0 7 * * 1,3,5"

# Every weekday at 07:00 UTC
cron: "0 7 * * 1-5"

# Every two days at 07:00 UTC
cron: "0 7 * * */2"

# Every two days from Monday to Friday at 07:00 UTC
cron: "0 0 * * 1-5/2"

# Every month on the first day of the month at 07:00 UTC
cron: "0 0 1 * *"
LiNk-NY commented 4 years ago

Hi Nuno, @nuno-agostinho

These are helpful suggestions.

I think that developers should have one bioc_check.yml file per branch to keep it as simple as possible. That means each branch (master and RELEASE_3_11) would have one bioc_check.yml file with on: push and a descriptive name: field to differentiate the GHA workflows.

My opinion is that the purpose of the GH Actions functionality is to test the package development before it gets pushed up to Bioconductor. Once it is in Bioconductor, the build system takes up the "role" of the cron jobs proposed here.

Thanks! -Marcel

lcolladotor commented 3 years ago

Hi Nuno,

I agree with Marcel that once the package is in Bioconductor, the Bioconductor daily builds are the best for regular reports on all operating systems. That way, if you are running the GHA workflow on a private repo, you can have more control on your GHA compute hours.

I like the idea of the GHA workflow working in mostly all branches, but that's something that can be edited by the user after using the template from biocthis.

On a similar line of thought, I would not drop the bioconductor-docker code because that's exactly the system that is most useful for reproducing bugs seen on the Bioconductor build machines. If that fails, there's no point really in running tests on Windows/Mac.

I agree that the code complexity is introduced by the redundancy, but at least when I wrote this GHA workflow, there was no way to get around it.

Best, Leo

lcolladotor commented 3 years ago

And yes, it's a complex GHA workflow. But it's what's really made it much easier for me to work and collaborate on new R packages that we are about to submit to Bioconductor and it's helped me a lot with reducing the time spent trying to reproduce errors I might not have on my system.

Instead of simplifying the GHA workflow, I need to expand the documentation.

nuno-agostinho commented 3 years ago

Hey @lcolladotor !

Thanks for the reply!

On a similar line of thought, I would not drop the bioconductor-docker code because that's exactly the system that is most useful for reproducing bugs seen on the Bioconductor build machines. If that fails, there's no point really in running tests on Windows/Mac.

Okay, but imagine that you are trying to fix a Mac and/or Windows-exclusive issue: you have to wait for the Linux one to finish first (even if you know it’s not problematic) before other builds start. Why not get the R and Bioconductor versions in the Mac/Windows builds from somewhere else so all builds are run in parallel?

Instead of simplifying the GHA workflow, I need to expand the documentation.

Always in favour of improving documentation :)

Cheers!

nuno-agostinho commented 3 years ago

Also,

the Bioconductor daily builds are the best for regular reports on all operating systems.

I would say GHA has its advantages, namely that you can restart a failed build immediately (instead of waiting a day for the next build) and check if something just crashed momentarily (e.g. a timeout of an external API). To test if it’s just a temporary error, if using Bioconductor system:

whereas using GHA:

I don’t see the problem of testing in redundant systems and I find GHA more flexible.

lcolladotor commented 3 years ago

Hi,

I just finished merging https://github.com/lcolladotor/biocthis/pull/11 and adding several features. The stuff about docker first then mac & Windows is no longer true, since now all 3 of them run in parallel, thanks to Marcel ^^.

Regarding cron, we can mention it somewhere on the docs, but I wouldn't make it the default. Just as a note, you might want to avoid Fridays since that's the day that the bioconductor docker devel image gets updated.

I have a few meetings and have to polish another package later today. But I'll come back to this issue. If you have time, I'd appreciate a PR for documenting the cron code and/or ubuntu names & RSPM.

Best, Leo

LiNk-NY commented 3 years ago

Hi all, @lcolladotor

I agree with @nuno-agostinho about having parallel checks on the package running. If we were to consider the "pure" Bioconductor approach, we would only use Biconductor "ordained" Docker images which would point to the Linux builds. As mentioned in other conversations, the Bioconductor Build System (BBS) is quite hard to replicate due to its complexity. Currently, the Bioc Docker images are the ones closest to matching the BBS setup. Any failures specific to Windows / Mac will not necessarily be the case on the BBS which sort of detracts from having the Windows / Mac versions available.

Cron jobs are icing on the cake and should be optional inputs to the use_bioc_github_action function for maintainers that want continual checks.

RSPM: I agree that the RSPM link will change when the Ubuntu version changes. It is a matter for maintainers to pick this up when it does. The next LTS release version is quite a ways ahead of us (it gets released every 2 years AFAIK). We could even use this map to automatically change that link: https://wiki.ubuntu.com/DevelopmentCodeNames

Best, Marcel

lcolladotor commented 3 years ago

Those cron jobs could be expensive icing on the cake hehe ;) Specially if you are testing a package on a private GitHub repo.

And thanks for the link on RSPM. I don't think that we need to make that code dynamic really, given the 2 year time window you outlined.