higan-emu / libco

libco is a cooperative multithreading library written in C89.
Other
125 stars 25 forks source link

Add Github Actions CI #40

Closed bjosv closed 1 year ago

bjosv commented 1 year ago

Adds CI builds and test runs for all backends except ppc.c.

The test examples are run with valgrind where possible and the problems found are added as comments for further investigations. #39 might fix some, but at least the described warnings can be seen in the CI logs.

Some additional changes were needed to get the test examples working on x86 and Windows. Maybe there is a good reason to use fixed address in test_serialization that I don't know, but it gives problems for x86 and ASan. I'm open for suggestions.

Example run

bjosv commented 1 year ago

This PR also might close #34 since it gives hints how to build libco with MSVC.

Screwtapello commented 1 year ago

Thank you for putting in all this work! I agree that having automated testing for subtle code like this is a good idea. Also, I'm sorry for taking so long to get around to reviewing your changes.

...all backends except ppc.c.

Why that backend specifically? Is it not supported in QEMU or something?

...at least the described warnings can be seen in the CI logs.

The CI run you link to is a successful one, I'd be interested to see one that reports valgrind failures so I know what to look for.

Screwtapello commented 1 year ago

I'll ask around to see if there's any reason why Near would have used mmap() at a specific address, or why he might have used mmap() at all instead of plain malloc() or similar, which would simplify a lot of these changes I think.

Alcaro commented 1 year ago

co_serializable means copying the stacks around, including a bunch of pointers. Trying to move then resume a stack will work badly.

But in this case, no stack lives longer than that allocation in that process, so it looks completely unnecessary to me. Maybe a leftover from testing or something.

bjosv commented 1 year ago

I have now updated the PR with your proposal to handle unexpected successes, good idea! I have also added Windows x86 which I missed, and changed test_serialization to use malloc() instead. Using malloc enables it to be built on windows as well.

...all backends except ppc.c.

Why that backend specifically? Is it not supported in QEMU or something?

It's not supported out-of-the-box by run-on-arch-action which I use to run QEMU on Github Actions, but it seems that QEMU should be able to run big-endian powerpc's. Debian 9 dropped support for it, but it might be possible to run with older releases somehow.. I can do some attempts but I guess it not that straight on.

The CI run you link to is a successful one, I'd be interested to see one that reports valgrind failures so I know what to look for.

The "problem" to visualize what #39 fixes is that the existing tests only indicates warnings in the log and Valgrind is not including this finding in the errorcode / found errors. We might be able to change the tests a bit so that invalid reads or similar also are found due to the stack switching.

Here is an log for a run where #39 is applied to compare to a log without #39

Like:

==3421== Command: ./test_args
==3421== 
==3421== Warning: client switching stacks?  SP change: 0x1ffefff7e8 --> 0x4de8c90
==3421==          to suppress, use: --max-stackframe=[13](https://github.com/bjosv/libco/actions/runs/4873194062/jobs/8692413899#step:6:14)7340480344 or greater
==3421== Warning: client switching stacks?  SP change: 0x4de8c78 --> 0x1ffefff7e8
==3421==          to suppress, use: --max-stackframe=137340480368 or greater
==3421== Warning: client switching stacks?  SP change: 0x1ffefff7e8 --> 0x4df8cd0
==3421==          to suppress, use: --max-stackframe=1373404[14](https://github.com/bjosv/libco/actions/runs/4873194062/jobs/8692413899#step:6:15)744 or greater
==3421==          further instances of this message will not be shown.
cothread parameterized function example
zuiderkwast commented 1 year ago

To detect the "Warning: client switching stacks?" in CI (for which valgrind just gives a success exit code 0) we'd need to run something like valgrind ./test_args | grep -qvF Warning, which returns a non-zero exit code if Warning is present in the output.

I haven't found any way to make valgrind treat this warning as an error.

Screwtapello commented 1 year ago

I have also added Windows x86 which I missed, and changed test_serialization to use malloc() instead.

Oh, I was going to do that, to save you the effort of messing with code you hadn't written. Thanks for stepping up!

Debian 9 dropped support for it, but it might be possible to run with older releases somehow.. I can do some attempts but I guess it not that straight on.

Debian has an extensive suite of cross-compilers you can install, and QEMU can run static binaries... it'd be nice to have a script to run all these multi-platform tests locally instead of having to push them to CI, but that doesn't have to block this PR merging.

The "problem" to visualize what #39 fixes is that the existing tests only indicates warnings in the log and Valgrind is not including this finding in the errorcode / found errors.

That's frustrating, but it doesn't seem like Valgrind is very amenable to automated test reporting like this.

Perhaps we should have a valgrind-wrapper.sh next to build.sh that does these checks for us, so we don't make the CI scripts even more gnarly. I'll see what I can come up with.

Screwtapello commented 1 year ago

OK, here's a shell-script which you can save as valgrind-wrapper.sh which I think will do the right thing:

#!/bin/sh
FAILURE=99
SUCCESS=0

expect_failure=0
if [ "$1" = "--expect-failure" ]; then
    expect_failure=1
    shift
fi

valgrind --error-exitcode="$FAILURE" --log-file="valgrind.log" "$@"
exitcode=$?
# Report the log messages just in case there was anything interesting
cat "valgrind.log" >&2

if [ "$exitcode" -eq "$SUCCESS" ]; then
    # Valgrind didn't reject our code, but if it warned
    # about stack-switching, we care about that too.
    # None of our tests should use enough stack
    # to trigger this by accident.
    if grep -qlF "Warning: client switching stacks?" "valgrind.log"; then
        echo "Test confused Valgrind by switching stacks" >&2
        exitcode=$FAILURE
    fi
fi

if [ "$expect_failure" -eq 1 ]; then
    if [ "$exitcode" -eq "$SUCCESS" ]; then
        echo "Test passed, but was expected to fail?" >&2
        exitcode=$FAILURE
    else
        echo "Task failed successfully" >&2
        exitcode=$SUCCESS
    fi
fi

exit "$exitcode"

If you run ./valgrind-wrapper.sh ./test_args it prints the Valgrind output then:

Test confused Valgrind by switching stacks

...and exits with code 99. If you run ./valgrind-wrapper.sh --expect-failure ./test_args it prints the Valgrind output then:

Test confused Valgrind by switching stacks
Task failed successfully

...and exits with code 0. That should simplify the CI scripts a bit.

bjosv commented 1 year ago

Looks nice, I will add it.

zuiderkwast commented 1 year ago

"Task failed successfully" sounds funny. How about "Task failed as expected" instead?

bjosv commented 1 year ago

Added the wrapper-script which cleaned up a bit. Now #39 will need to flip some --expect-failure flags which is a helpful indication.

A link to the CI run.

Screwtapello commented 1 year ago

Thanks for all your work on this!