spine-generic / data-multi-subject

Multi-subject data for the Spine Generic project
Creative Commons Attribution 4.0 International
21 stars 15 forks source link

CI workflow file (`validator.yml`) is in need of maintenance (dependency versions, disk space, speed) #167

Open joshuacwnewton opened 3 weeks ago

joshuacwnewton commented 3 weeks ago

The validator.yml GitHub Actions CI workflow file is currently failing on all PRs, blocking merges even for approved PRs.

@mguaypaq helped to summarize many of the various issues:

12:37 PM It's been a while, but from what I remember:

  • [x] The code is unnecessarily slow to run (because it re-runs many calls to pybids, in particular)
  • [x] It's still pinned to the very much EOL python 3.7
  • [ ] Some of its dependencies are not backwards compatible
  • [ ] It relies on an old file naming scheme that we've since changed
  • [x] It may be running out of disk space on the github actions runner? But there's a code comment somewhere in the workflow files about how to get more space?
  • [ ] Some of the checks that it does seem... wrong? Like, a missing abs() around a comparison of two floating point numbers for approximate equality, for example
  • [x] It also looks like the return code of the checker is just ignored by the workflow file? But it doesn't even install anymore, and that results in a workflow failure
  • [x] No automatic retries on failed gets (with -J8)

I'm very interested in fixing the workflow to help unblock current and future PRs! (I plan to start with the more "maintenance"-y tasks, then move on to the "correctness" tasks related to the validation itself.)

mguaypaq commented 3 weeks ago

Another small point: this line is very flaky, and tends to make the workflow fail: https://github.com/spine-generic/data-multi-subject/blob/5398c0faf62b02e0475f76498a8af24c0c3722c2/.github/workflows/validator.yml#L98

Usually it's just a transient network error while downloading a few of the files. So, we could make it more robust with a simple:

# try a second time if a few downloads failed the first time
git annex get -J8 || git annex get
joshuacwnewton commented 3 weeks ago

A hopeful start: The disk space issues have a very quick, very neat solution. :tada:

At the start of the workflow, things look like this:

  66G   74G /mnt           /dev/sdb1
  21G   73G /              /dev/root
  99M  105M /boot/efi      /dev/sda15

Our PWD is associated with the "21G free" disk. But, note /mnt, which has 66GB free (!!!). (Notably, it looks like this tempdisk used to be 14GB.)

We can take advantage of all of this extra space using the Maximize build disk space action. After running this action, df looks like this:

  87G   87G /home/runner/work/data-multi-subject/data-multi-subject /dev/mapper/buildvg-buildlv
 512M   73G /                                                       /dev/root
 100M   74G /mnt                                                    /dev/sdb1
  99M  105M /boot/efi                                               /dev/sda15

Thanks to the Logical Volume Manager (LVM), We now have access to the entirety of the 87G, with a step that takes 2 seconds, as opposed to the 3+ minutes it takes to remove unwanted software.

I was a little concerned about whether this would result in slower RW times (thanks to LVM), but the git annex step seems to take ~10m either way. :)