pistacheio / pistache

A high-performance REST toolkit written in C++
https://pistacheio.github.io/pistache/
Apache License 2.0
3.18k stars 698 forks source link

ci: build on multiple Linux distributions #1070

Closed Tachi107 closed 2 years ago

Tachi107 commented 2 years ago

This new CI job builds and tests Pistache on different Linux distros, with different compilers, and under different sanitizers, and also re-adds Codecov coverage integration back.

Currently, Pistache is tested on Debian Stable, Debian Testing and Red Hat Enterprise Linux 8, and the code is built with GCC and Clang. Sanitizers are only used on Debian, because as far as I'm aware RHEL doesn't ship the various sanitizers in its base images.

ThreadSanitizer is also disabled for now, as it reports a lot of data races in some tests. Some of the bugs are in the unit tests themselves, but I haven't yet looked into solving them. @dennisjenkins75 I seem to remember that you opened a meta-issue about dace conditions in Pistache but I haven't found it.

~I'll add RHEL 9 as soon as Red Hat publishes the image on Docker Hub~ added using Red Hat's own registry

It would be nice to add something like Gentoo too; @dennisjenkins75 you use/used Gentoo, right? Would you be able to help?

~I've also fixed a bug in the streaming_test test that caused lock ups on old distros (@kiplingw this means that PPA builds for Ubuntu 20.10 and older should be fixed as well)~ pushed as 8ce07256dc220cbf96b3fd6f5991a3c68fd54aba

Lastly, I ~removed~ reduced the scope of the autopkgtest job; you can find my rationale in the commit message.

codecov-commenter commented 2 years ago

Codecov Report

Merging #1070 (4003514) into master (cb71611) will increase coverage by 8.74%. The diff coverage is 66.66%.

:exclamation: Current head 4003514 differs from pull request most recent head b02f113. Consider uploading reports for the commit b02f113 to get more accurate results

@@            Coverage Diff             @@
##           master    #1070      +/-   ##
==========================================
+ Coverage   69.54%   78.29%   +8.74%     
==========================================
  Files          53       53              
  Lines        7943     6625    -1318     
==========================================
- Hits         5524     5187     -337     
+ Misses       2419     1438     -981     
Impacted Files Coverage Δ
include/pistache/async.h 90.12% <ø> (-1.30%) :arrow_down:
include/pistache/base64.h 100.00% <ø> (ø)
include/pistache/client.h 96.87% <ø> (-3.13%) :arrow_down:
include/pistache/common.h 100.00% <ø> (ø)
include/pistache/cookie.h 100.00% <ø> (ø)
include/pistache/description.h 66.03% <ø> (-18.82%) :arrow_down:
include/pistache/endpoint.h 95.00% <ø> (+0.26%) :arrow_up:
include/pistache/errors.h 0.00% <ø> (ø)
include/pistache/flags.h 92.30% <ø> (ø)
include/pistache/http.h 86.48% <ø> (-0.31%) :arrow_down:
... and 85 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 127b310...b02f113. Read the comment docs.

kiplingw commented 2 years ago

Hey @Tachi107. Thanks for your work.

Regarding your notes, "Lastly, this autopkgtest CI is a bit redundant, as the Launchpad PPA does the same exact thing more reliably and on more architectures, but only after merging stuff to master." Launchpad actually doesn't perform Autopkgtest / DEP-8 testing. However, I have requested this several years ago.

Tachi107 commented 2 years ago

Launchpad actually doesn't perform Autopkgtest / DEP-8 testing. However, I have requested this several years ago.

Oh ok, thanks for letting me know. In this case, what about only running autopkgtest on GitHub Actions' native architecture?

kiplingw commented 2 years ago

We could do that, but then we lose the benefits of testing on other architectures which, historically, have been quite fruitful for us in finding weird bugs with memory alignment, race conditions, etc. Even if the tests take a few hours to complete, I don't have a problem with that. It's not ideal, but we also don't have substantive PRs every day.

Another thing you could also try is checking to see if QEMU bails due to a timeout or some other reason that has nothing to do with us, and to try again after a period of time.

Tachi107 commented 2 years ago

We could do that, but then we lose the benefits of testing on other architectures which, historically, have been quite fruitful for us in finding weird bugs with memory alignment, race conditions, etc.

We wouldn't, as Launchpad runs normal tests on multiple architectures. Running autopkgtest on multiple archs specifically doesn't help that much, the only thing that's tested is compilation.

I also believe that running tests under QEMU doesn't provide the same advantages as running them on a real CPU, because emulators often do not have all the quirks of physical processors.

Even if the tests take a few hours to complete, I don't have a problem with that. It's not ideal, but we also don't have substantive PRs every day.

It wouldn't be an issue if you'd have to run tests only once, but what if you are debugging an issue that you can't reproduce on your local machine but only in CI (like the Ubuntu 21.10 / RHEL 8 one)? You'd have to wait hours to see if your changes fixed the issue, for each single change/commit. In fact, I've encountered this "annoyance" when fixing the aforementioned issue, and had to temporarily disable the autopkgtest job.

Another thing you could also try is checking to see if QEMU bails due to a timeout or some other reason that has nothing to do with us, and to try again after a period of time.

The issue is not related to timeouts, it's just that autopkgtest-virt-qemu seems to have serious bugs. How reliable could tests ran on a buggy system be?

Overall the autopkgtest job in its current form does more harm than good...

kiplingw commented 2 years ago

We wouldn't, as Launchpad runs normal tests on multiple architectures. Running autopkgtest on multiple archs specifically doesn't help that much, the only thing that's tested is compilation.

Not just compilation, but correct package dependency calculation, packaging, installation, and testing the installed artifacts. This means verifying headers were installed correctly, code links, and executes. There's a lot that is going on there.

I also believe that running tests under QEMU doesn't provide the same advantages as running them on a real CPU, because emulators often do not have all the quirks of physical processors.

That is true, but in my experience it's better than having neither an emulator nor the actual physical hardware. In the case of POWER9 systems, they are very expensive. For RISC-V, the test boards from SiFive are no longer available.

It wouldn't be an issue if you'd have to run tests only once, but what if you are debugging an issue that you can't reproduce on your local machine but only in CI (like the Ubuntu 21.10 / RHEL 8 one)? You'd have to wait hours to see if your changes fixed the issue, for each single change/commit. In fact, I've encountered this "annoyance" when fixing the aforementioned issue, and had to temporarily disable the autopkgtest job.

That's true. But imagine if the test failed, and for good reason. By removing the test, you don't actually solve the problem. You just hide it.

I agree though that the slow performance is an issue. But is there a way to upgrade the shared hosting environment it runs on?

The issue is not related to timeouts, it's just that autopkgtest-virt-qemu seems to have serious bugs. How reliable could tests ran on a buggy system be?

This might be something for @pabs3 to chime in on.

Overall the autopkgtest job in its current form does more harm than good...

The biggest problem with it is that it's slow. I think the solution is to find out how we can make it less so.

Tachi107 commented 2 years ago

Not just compilation, but correct package dependency calculation, packaging, installation, and testing the installed artifacts. This means verifying headers were installed correctly, code links, and executes. There's a lot that is going on there.

All these steps will still be executed, but only on amd64.

I'm not saying that running autopkgtest on multiple architectures is useless, but considering that what it essentially does compared to regular unit tests is linking to the installed library, the advantages that it provides are way less than its costs, especially if you consider the fact that we are talking about emulated architectures.

That is true, but in my experience it's better than having neither an emulator nor the actual physical hardware.

Of course :)

By removing the test, you don't actually solve the problem. You just hide it.

Well, I do solve one problem: I make other CI jobs less painful.

I agree though that the slow performance is an issue. But is there a way to upgrade the shared hosting environment it runs on?

As far as I know that's not possible with GitHub Actions.

The biggest problem with it is that it's slow. I think the solution is to find out how we can make it less so.

Yes... I would really, really like to be able to test Pistache on multiple architectures here (i.e. without depending on Launchpad), but GitHub Actions isn't the right tool for the job, there's not much I can do about this. Travis was amazing, but unfortunately it is no longer free for open source project, and as far as I know there aren't free CI/CD solutions that offer native multi-arch support.

Tachi107 commented 2 years ago

Also, I believe that Debian runs autopkgtest on multiple architectures, so once I manage to package Pistache we'll be able to rely on that.

kiplingw commented 2 years ago

All these steps will still be executed, but only on amd64.

* dependency calculation: is handled by apt, not us, and dependencies do not change between architectures in our case

I was referring to both build and runtime dependencies. The only way to know that runtime dependencies were calculated properly is to actually install the package in a clean room and test the installed artifacts.

* testing the installed artifacts: pistache gets installed the same way on all supported archs (except of course for the `usr/lib/triplet` directory), and what autopkgtest tests is that the package was installed correctly. Code linkage and execution is already tested on multiple archs in Launchpad (unit tests link to the compiled Pistache library, and that same library object gets installed in `usr/lib`)

Not the installed package. The package is not installed and tested on the PPA builders.

I'm not saying that running autopkgtest on multiple architectures is useless, but considering that what it essentially does compared to regular unit tests is linking to the installed library, the advantages that it provides are way less than its costs, especially if you consider the fact that we are talking about emulated architectures.

There is a reason why the Debian and Ubuntu projects both perform DEP-8 tests on their build servers (not including Canonical's Launchpad). It's because the in-tree unit tests are just that, in-tree unit tests and not tests on system package manager installed artifcacts.

As far as I know that's not possible with GitHub Actions.

It might be worth asking GitHub. They may have a paid offering to improve it. If you find out, let me know.

Yes... I would really, really like to be able to test Pistache on multiple architectures here (i.e. without depending on Launchpad), but GitHub Actions isn't the right tool for the job, there's not much I can do about this. Travis was amazing, but unfortunately it is no longer free for open source project, and as far as I know there aren't free CI/CD solutions that offer native multi-arch support.

How much would it cost to restore Travis CI to accomplish the above?

Tachi107 commented 2 years ago

How much would it cost to restore Travis CI to accomplish the above?

It's not cheap. $69 per month

kiplingw commented 2 years ago

On Sun, 2022-05-22 at 10:55 -0700, Andrea Pappacoda wrote:

It's not cheap. $69 per month

Yes, that's a rip-off.

-- Kip Warner OpenPGP signed/encrypted mail preferred https://www.thevertigo.com

Tachi107 commented 2 years ago

Hey @kiplingw, I've been thinking about this a bit more, and while I haven't found a solution I think that we could still improve the current situation.

Problems with the current approach:

I've recently discovered that CircleCI offers arm runners, meaning that we could use it to run autopkgtest on arm64. This is not as great as what we could accomplish with Travis, that offered native amd64, arm64, s390x and ppc64el runners, but is better than the current setup:

What are your opinions on this?

kiplingw commented 2 years ago

Hey @kiplingw, I've been thinking about this a bit more, and while I haven't found a solution I think that we could still improve the current situation.

Problems with the current approach:

* Architectures are emulated, meaning that

  * they take too much to complete, delaying other jobs

It depends on how they are virtualized. But if fully virtualized, then yes, they will be slower than the host's native architecture.

  * effectiveness is limited, as we can't test the quirks of true CPUs and might encounter strange quirks of the emulator instead

It's hard to say how limited they are, but we can probably assume that the issues that might surface on another architecture are more likely to be captured on native hardware.

* Builds are effectively ran only on amd64, i386, and sometimes arm64. armhf and ppc64el containers _always_ fail to initialize, so no actual testing is done (you can verify this by looking at the [logs](https://github.com/pistacheio/pistache/actions/runs/2372988044) of the autopkgtest jobs, you'll see a lot of warnings that say "**test (ppc64el)** -  _autopkgtest failed, but it's not your fault_")

That is definitely a problem.

I've recently discovered that CircleCI offers arm runners, meaning that we could use it to run autopkgtest on arm64. This is not as great as what we could accomplish with Travis, that offered native amd64, arm64, s390x and ppc64el runners, but is better than the current setup:

* Builds would run at native speed, as no emulation is involved

* We wouldn't have to deal with emulator quirks anymore

* arm64 would get tested consistently (emulated arm64 builds often fail like armhf and ppc64el ones)

What are your opinions on this?

I think that the best solution is to basically have what the Debian / Ubuntu projects already have for CI infrastructure, but available for third parties like us. Nobody has done that yet, so we have to find a compromise.

I will reach out to Travis CI and see if they can provide us with a discount, or even free credits, to use their platform.

In the mean time, let me leave you with a question. If you had access to any of AWS offerings, or Oracle Cloud, and money was not an issue, would you be able to find a creative solution?

Tachi107 commented 2 years ago

It depends on how they are virtualized. But if fully virtualized, then yes, they will be slower than the host's native architecture.

Yes... Not only virtualized, but even emulated

I think that the best solution is to basically have what the Debian / Ubuntu projects already have for CI infrastructure, but available for third parties like us. Nobody has done that yet, so we have to find a compromise.

Exactly

I will reach out to Travis CI and see if they can provide us with a discount, or even free credits, to use their platform.

Maybe this is doable: https://blog.travis-ci.com/2021-06-16-ibm-aws

In the mean time, let me leave you with a question. If you had access to any of AWS offerings, or Oracle Cloud, and money was not an issue, would you be able to find a creative solution?

Unfortunately I don't know how AWS and similar services work, I've never used them... But I guess that setting up webhooks and whatnot that trigger builds on every commit and integrate with PRs is not exactly trivial :/

Edit: this looks interesting: https://docs.travis-ci.com/user/billing-overview/#partner-queue-solution

Edit2: https://docs.travis-ci.com/user/billing-faq/#what-if-i-am-building-open-source

Edit3: sent en email to support@travis-ci

Edit4: I've now received a reply from the support team

kiplingw commented 2 years ago

Unfortunately I don't know how AWS and similar services work, I've never used them... But I guess that setting up webhooks and whatnot that trigger builds on every commit and integrate with PRs is not exactly trivial :/

It's just a virtual or bare metal machine you can shell into via SSH with sudo / root credentials (called "EC2"). It has its own IP address, disk, network interface, etc., as if it was right beside you. You can install anything you want on it. Try to imagine how you could solve all of the above permanently if you had a place to let your imagination run. ;)

Tachi107 commented 2 years ago

It's just a virtual or bare metal machine you can shell into via SSH with sudo / root credentials (called "EC2"). It has its own IP address, disk, network interface, etc., as if it was right beside you. You can install anything you want on it.

Oh, nice. I really like to manage small servers every once in a while, so this shouldn't be an issue for me.

Try to imagine how you could solve all of the above permanently if you had a place to let your imagination run. ;)

I'll look into this later :)

Anyway, one member of the Travis support team replied to me again, saying that they approved our request for free OSS credits. There's one last blocker though: we have to add the Travis-CI app to the Pistache organization:

Hello Andrea,

Thank you for your reply.

It seems your GitHub organization @pistacheio is not signed up on the Travis CI platform. Please configure the Travis CI GitHub app https://github.com/apps/travis-ci for your organization. Please reach back when configured and I will gladly add credits for your OSS projects.

Thanks

I'm not a member of the @pistacheio organization, so I can't add the app myself, but I've sent a request should've been forwarded via email to the formal members- somebody should accept that.

kiplingw commented 2 years ago

Anyway, one member of the Travis support team replied to me again, saying that they approved our request for free OSS credits. There's one last blocker though: we have to add the Travis-CI app to the Pistache organization:

Oh perfect!

Hello Andrea, Thank you for your reply. It seems your GitHub organization @pistacheio is not signed up on the Travis CI platform. Please configure the Travis CI GitHub app https://github.com/apps/travis-ci for your organization. Please reach back when configured and I will gladly add credits for your OSS projects. Thanks

I'm not a member of the @pistacheio organization, so I can't add the app myself, but I've sent a request should've been forwarded via email to the formal members- somebody should accept that.

I just added Travis CI to our GitHub apps. I tried to also add you to the organization, but I think only @oktal can do it.

Tachi107 commented 2 years ago

I just added Travis CI to our GitHub apps.

Uhm, if I go to the installation page GitHub still tells me that my request is pending... Strange. Maybe it just takes a while to update.

image

kiplingw commented 2 years ago

On Fri, 2022-05-27 at 13:06 -0700, Andrea Pappacoda wrote:

Uhm, if I go to the installation page GitHub still tells me that my request is pending... Strange. Maybe it just takes a while to update.

Yeah it looks like it takes a while to setup. Let's give them more time.

-- Kip Warner OpenPGP signed/encrypted mail preferred https://www.thevertigo.com

Tachi107 commented 2 years ago

@kiplingw I got a response from Travis support, and it seems that the app wasn't configured correctly.

Hello Andrea,

Thank you for your reply.

Could you please confirm that we installed the app correctly?

I am afraid it seems it's not yet. Can you please ask any of the admin members of the Org to log in to app.travis-ci.com and head over to https://app.travis-ci.com/account/repositories. If the organization doesn't appear on the left under Organizations, clicking on Review and add your authorized organizations. would configure the organization on Travis CI platform.

kiplingw commented 2 years ago

I clicked the Review and add your authorized organizations button and was taken to this page:

Screenshot_2022-05-30_09-35-56

Can you let support know that? They might also need my GitHub account ID of @kiplingw.

Tachi107 commented 2 years ago

It seems that you aren't authorised to add the app to the organization either...

Maybe @dennisjenkins75 is? I think you need write access to the organization - write access to the repo is not enough.

kiplingw commented 2 years ago

I've tried removing and re-adding Travis CI, but I see the same thing as above. It just says "Cancel Request" next to it.

Tachi107 commented 2 years ago

It just says "Cancel Request" next to it.

If is says that it means that you are not authorized to add the app to the organization, otherwise GitHub would've let you install it immediately.

Example:

image

kiplingw commented 2 years ago

@Tachi107, I think the only option is we may have to contact GitHub support asking for them to set our admin bits since none of us have heard from @oktal in an eternity. I have just sent a request now. I will keep you guys apprised.

dennisjenkins75 commented 2 years ago

Acknowledged. Thank you Kip.

Tachi107 commented 2 years ago

Hey @kiplingw, any news from the GitHub support team? Travis has temporarily closed our request as we haven't been able to install the app in more than two weeks, but I can always re-open it.

kiplingw commented 2 years ago

Hey @kiplingw, any news from the GitHub support team? Travis has temporarily closed our request as we haven't been able to install the app in more than two weeks, but I can always re-open it.

Still nothing. I've just sent GitHub support another follow up note and forwarded you and @dennisjenkins75 the thread.

kiplingw commented 2 years ago

@Tachi107, we should be good to go. I've added it to the organization since @oktal set my owner bit, which I in turn set yours and @dennisjenkins75.

Tachi107 commented 2 years ago

@Tachi107, we should be good to go. I've added it to the organization since oktal set my owner bit, which I in turn set yours and dennisjenkins75.

Thanks everyone! I've re-contacted the Travis support team, hope they'll reply soon.

Tachi107 commented 2 years ago

Done!

Hello Andrea ,

Thanks so much for your reply! We have successfully converted your account to a OSS account.

Open source has always been and always will be at the core of what Travis CI stands for. After reviewing your open source qualifications, I am pleased to notify you that 25k credits were added to your account allowing you to run OSS builds. When your credits begin running low again, please reach back out to the Support team.

Credits Consumption Metric OS. # CREDITS PER STARTED BUILD MINUTE Linux. 10 Experimental FreeBSD. 10 Windows. 20 MacOS. 50

Please let us know if you have any questions!

Thanks and happy building!,

Your Friends at Travis-CI

Note: OSS credits may be used solely with respect to open source projects that reside on a public repository. In no event shall OSS credits be used for any commercial project or with respect to any project that resides on a private repository. Travis CI shall determine the amount of all OSS Credit allotments.

Tachi107 commented 2 years ago

Thanks, now we only need to re-enable Travis builds. I'm doing it now.

kiplingw commented 2 years ago

Good work @Tachi107!

Tachi107 commented 2 years ago

The Travis job is now fully operational, and I've also made some minor improvements to the GitHub CI linux job :D

I've committed everything to the master branch directly as I had to get feedback from Travis on each commit.

kiplingw commented 2 years ago

Right on. We're on a roll now!