CI / GitHub's "Actions"

t-w commented 1 year ago

There is some initial version for "continuous" integration (or, at least, builds and some tests) that was added with #34. For now, it is set up only to start builds when there is an action (merge or push) on the citest branch. It is done so to prevent unnecessary overuse (with each small commit), as some of the tests might be a bit I/O expensive (maybe they can be a bit optimized later, eg. to check less cases, but for now I would leave it that way).

I have created citest branch in the lclevy/ADFlib repo - but it seems, the repo does not have the "actions" enabled (I was useing this with my fork so far). So, @lclevy, you may want to eventually enable it, if you want the builds executed in this repo.

For the configuration itself, there can be more things/details to discuss about it - so an issue for this might be useful.

t-w commented 1 year ago

OK. I see you enabled it, the builds are running, now.

t-w commented 1 year ago

Already a second thought... the idea with one branch for building won't work if development will go on with multiple branched in this repo (and not in people's forks) - each branch of the dev. should have possibility to test independently...

So either:

development will be done in forks, tested there, than merged here (my way so far), or
the "action's" config. will have to be changed

lclevy commented 1 year ago

Hum, I did nothing AFAIR

t-w commented 1 year ago

It did not start initially, maybe it was just delayed somehow... Anyway - it is working, the first build is finished.

Btw - the version there is incorrect (still the planned 0.7.13). I think after fixes applied by @kyz and (eventually) implementing truncate, we can make a new version - so that it is more-less complete, to some level.

lclevy commented 1 year ago

Truncate is not a priority before a release IMHO

t-w commented 1 year ago

OK. As you prefer. It will leave the "file write support" still partial. But, yes, adding it would require more delay...

lclevy commented 1 year ago

It is just my opinion. As you want

kyz commented 1 year ago

It is done so to prevent unnecessary overuse (with each small commit),

I think it would be fine to have this, which I believe will also run when other people want to merge to master here from their own repositories.

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

This would allow doing tests on most submitted PRs, but would not run just for your own branches. As far as I know, most people would bundle several commits in any given push, and the actions only run on each push, not once for every commit, so most people's use of this service would be quite minimal.

You are right to think about the work done by the tests, though. I can see that most tests run in milliseconds, except for test_file_truncate2 (11s), test_file_write_chunks (19s), test_file_seek_after_write (25s), test_file_truncate (8m30 ?!).

This test is doing thousands of file open/close tests. This test specifically would greatly benefit from a purely in-memory virtual device. In my own project, the library API lets you supply a structure with I/O functions, so you can get a huge speed boost for testing by replacing the normal disk I/O (fopen, etc.) with memory access, no files needed. In ADFLib that ability seems locked away, you can't get to replace adfEnv.nativeFct. Even though it's a pointer to an abstract implementation, the initialisation of that pointer is fixed at compile time. A good improvement would be to allow setting your own native device, or even registering one on a chain and having adfOpenDev ask each one if they handle a given path.

kyz commented 1 year ago

I created the ramdisk branch to show you what I had in mind. The tests do run slightly faster when you cut out all external I/O. I'm not proposing this for a PR just now, because I think it could be done even better (by making the dump I/O functions a "device", and then you wouldn't need the generic device at all, and then the linux and windows native devices can just be conditional build objects...)

Anyway, what I found is that test_file_truncate takes so long to run because the code that checks the zeroed part of an enlarged file reads 1 byte at a time with adfFileRead() and ck_assert_msg() in a loop. That leads to billions of unnecessary calls. Replacing this with a single adfFileRead() into a suitable buffer, and checking each byte with if first (only calling ck_assert_msg if it would assert), reduces the runtime from 484s to 16s. I committed 3f1ccc5ef4ef60a5dc3a2c85535c43acf3a75fe0 to speed up the test. Now the tests complete in 1m40s.

Do you think that's fast enough to enable the github actions for every master commit / PR ?

kyz commented 1 year ago

Also, when I was looking at the native device code...

https://github.com/lclevy/ADFlib/blob/master/src/linux/adf_nativ.c#L129-L167

Is it correct that adfLinuxReadSector() calls write() and adfLinuxWriteSector() calls read() ?

t-w commented 1 year ago

Also, when I was looking at the native device code...

https://github.com/lclevy/ADFlib/blob/master/src/linux/adf_nativ.c#L129-L167

Is it correct that adfLinuxReadSector() calls write() and adfLinuxWriteSector() calls read() ?

A copy-paste+reverse bug.... I was doing linux device pretty quickly (probably, as we can see, too quickly) as a skeleton/prototype - and here we are... Thanks for checking in this. Just corrected directly in master (d91b0b3).

lclevy commented 1 year ago

Nice to see you're collaborating on this project guys!

t-w commented 1 year ago

Concerning native device interface (API) and a virtual device - no second thoughts for now. Linux implementation was just proposed for "completeness", it seemed weird that such an obvious case was not there. Still, myself I do not have a use case for any native device at all, so I can't really test it...

For the performance of the truncate test - indeed, it was written quickly and as simple as possible and I overlooked an obvious way of optimizing it. So thanks for checking on this and correcting.

Note however, that in case of some tests the whole point is to read/write in small/predefined chunks, so those cannot be optimized this way. I tried to select some values that are test covering as much as possible (esp. edge cases, like block ends/begins, ext. blocks switching/utilization etc.). Maybe some of the cases could be also dropped, but I'd be careful with that. For a relatively complex thing like a filesystem, better test too much than not enough.

Performance concerns are for me also in other places - current implementation uses a single data block and a single ext. block and it is intermingled with file operations. This could be done in a bit different way. However, for now I chose the way of filling up the gaps in functionality, not rewriting everything in that part. If we try to change everything, esp. not having some relevant set of tests, it is easy to get lost in this. Some places are really tricky to modify.

For the main topic - limiting builds/tests to master branch seems to have little sense. I think the best way would be to just enable tests for all branches (not only master). Changes should be tested before merging them to master(!). It should be even required that a branch passes the tests before merging.

But this may also depend how we look at branches... For me, master is place where tested/confirmed things should go. All development should be elsewhere, tested there and (eventually, if confirmed) merged to master. So the easiest way is to enable builds for all...

@lclevy, this impacts activity/resource usage on your repository/account, so I guess it would be the best if you decide this.

lclevy commented 1 year ago

Yes, for me first step is to hunt functional bugs. Then address performance as a bonus.

kyz commented 1 year ago

For the main topic - limiting builds/tests to master branch seems to have little sense. I think the best way would be to just enable tests for all branches (not only master). Changes should be tested before merging them to master(!). It should be even required that a branch passes the tests before merging.

This is what on: pull_requests: branches: [ master ] does. Every push made to a branch that is part of an open merge request proposing to merge to master, is tested. I believe the GitHub UI even warns against merging if the tests haven't run yet or it has failed the tests.

That's why I proposed what I wrote above: run tests on pushes to master, and on pull requests for master. It runs in any case where someone wishes to change master... and to save resources, doesn't run on other branches, unless they are part of a pull request for master. I think that's a good balance.

t-w commented 1 year ago

For the main topic - limiting builds/tests to master branch seems to have little sense. I think the best way would be to just enable tests for all branches (not only master). Changes should be tested before merging them to master(!). It should be even required that a branch passes the tests before merging.

This is what on: pull_requests: branches: [ master ] does. Every push made to a branch that is part of an open merge request proposing to merge to master, is tested. I believe the GitHub UI even warns against merging if the tests haven't run yet or it has failed the tests.

That's why I proposed what I wrote above: run tests on pushes to master, and on pull requests for master. It runs in any case where someone wishes to change master... and to save resources, doesn't run on other branches, unless they are part of a pull request for master. I think that's a good balance.

Yes, right... This is fine, I agree.

But, besides the above, it would be good to have a possibility to trigger builds without a push or a pull request to the master. So I am not sure what is the best way here... I'd at least keep something like the current build on the 'citest' branch (or whatever we name it). For me, this is enough (I do tests in my fork first), but, for instance, if several people will work on different things in the main repo, there may be a conflict...

The simplest way would be to enable builds for all pushes, and normally the best as each set of changes would be tested. But I do not know what is the GitHub's policy on resources... if there is some limit, monthly quota or whatever... If there is nothing special and the jobs would be just queued and executed, then maybe we can do it...

Actually, I've just checked the page https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration It states that there is a time limit for job, for workflow and limits for concurrent jobs. So maybe it would be OK. to enable for all. The jobs would be just queued, I guess, if triggered more than the limit allows.

Anyway, this really should be decided by @lclevy, as the usage is per account.

kyz commented 1 year ago

I'd at least keep something like the current build on the 'citest' branch

We could write on: push: branches: [ master, citest ] to achieve that.

For me, I don't see a functional difference between these two things:

Working on a non-master branch, want to test it, don't want to merge it yet. Merge to a specially named citest branch (taking care to completely overwrite what was last on that branch) and push, that triggers CI by the rule on: push: branches: citest
Working on a non-master branch, want to test it, don't want to merge it yet. Have a PR open, marked as draft if you're not ready to merge. Pushes to your branch will trigger CI by the rule on: pull_request: branches: master while the PR is open, regardless of draft status

The reason I'd like e.g. on: push: branches: + on: pull_request: branches: in comparison to on: [ push, pull_request ] is that I'd like the possibility of not running CI for people who don't want it, and my thinking is that people who want it choose to open PRs, and people who don't, don't open PRs (yet)

I'm not an expert in GitHub Actions, so I took a look at a few projects, it seems all our ideas are OK, and we could do even more if we want. Examples:

So, how about:

on:
    push:
        branches: [ master, citest ]
    pull_request:
        branches: [ master ]

t-w commented 1 year ago

I am not a GitHub expert, either... What you propose is fine for me.

We would have to think more carefully if there was plenty of people working on a project, if there is just a few, we can have something simple, at least start with it. That was my idea for that matter.

t-w commented 1 year ago

There are few other things remaining:

what (eventually) configure for releases (building tags, storing artifacts?)
- for this I would at least keep the debs for Ubuntu and thew "release" DLL build for Windows
- Actions are a bit crappy that building for other OS is not supported (in GitLab, you can use practically anything having a Docker image), unless I over
you mentioned static analysis etc. - that could be also something to add - but really later; otherwise we will never finish 0.8; I am not saying further clean-up is not needed, just not everything at the same time. I have already added a couple of compiler switches catching bad stuff (at spent some time cleaning it up), but a lot more can be done for it.

IMHO, none of these has to be done now, for 0.8, though it would be good to provide something just working (like the utilities, even statically compiled). We can at least exchange couple of thoughts about it.

kyz commented 1 year ago

I pushed 62a2b07ba3c838aabfc542e5da9aaafa22012808 to master to enable what I proposed above.

You're right that we could do with more, you're already doing something with deb packages as artifacts, how would these appear against a release (and what defines a release, just a tag marked vX.X or is something else needed).

What would also be great, is "checks" made automatically when a push in a PR is made, e.g. does it build with autotools, does it build with cmake, does it build debian is a "check", and they show in the PR thread itself. Is this already handled by the current actions?

kyz commented 1 year ago

I noticed there is a github action that can install cygwin: https://github.com/marketplace/actions/install-cygwin

lclevy / ADFlib

CI / GitHub's "Actions" #36