kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
3.93k stars 193 forks source link

CI: Proposal: Use Azure piplines #621

Open cugu opened 4 years ago

cugu commented 4 years ago

I started to implement a Azure Pipeline for the whole kaitai CI: https://github.com/cugu/kaitai_struct/tree/azure

The complete pipeline (Build Compiler, Translator Test, Build JS, Build Packages, Target Language Tests: https://dev.azure.com/cugu/dfir/_build/results?buildId=106&view=results) takes about 5:25 minutes (with caching).

Currently only Python and Ruby tests are working, but implementing more languages should not take more time, as they are tested in parallel.

I wanted to get some feedback on this before I continue.

Proposed actions:

GreyCat commented 4 years ago

Sorry for coming so late. Can I ask you to re-run this pipeline, so I can check out the results?

As of now, it says "Build not found".

cugu commented 4 years ago

Here you go https://dev.azure.com/cugu/dfir/_build/results?buildId=229&view=results

Another option might be using GitHub Actions.

okuryu commented 4 years ago

I definitely recommend using GitHub Actions instead.

cugu commented 4 years ago

@okuryu is there any reasoning behind this?

A reason for Azure DevOps would be the large number of preinstalled SDKs (https://github.com/microsoft/azure-pipelines-image-generation/blob/master/images/linux/Ubuntu1804-README.md), while those need to be installed or activated in GitHub Actions first. This would make GitHub Action pipelines significantly slower.

okuryu commented 4 years ago

You can use similar (perhaps almost the same?) tools in the GitHub Actions pipeline. I never thought GitHub Actions build container is slow. https://help.github.com/en/github/automating-your-workflow-with-github-actions/software-in-virtual-environments-for-github-actions

GreyCat commented 4 years ago

Thanks for this update!

There are some things which I agree with and some things which I strongly disagree with in this direction.

cugu commented 4 years ago

Thanks for the feedback @GreyCat . I totally agree and the pipeline above was not meant to be a final solution but a foundation of discussion.

  1. I agree that we should not have a single pipeline for everything. Even though when tests are fast enough there is not really a point for testing a bit more than required. The whole kaitai struct project is quite complex and we should not introduce further complexity for some seconds less CI runtime. The question here is how many pipelines we need:

    1. A CI pipeline to test and build the compiler and generate test code for all languages. This pipeline should also trigger all pipelines from ii.
    2. A CI pipeline for every language to test the runtime and the generated test code.
  2. The runtime repositories currently do not tests against the generated ci_targets. Would it be and option to split the kaitai_struct_tests repo to the compiler and the runtime repos?

  3. I'm okay with not storing artifacts in Azure, though I do not really like storing artifacts (and websites) in git repositories as well. Especially mixture of generated code and scripts like in https://github.com/kaitai-io/ci_targets feels weird.

GreyCat commented 4 years ago

Even though when tests are fast enough

I actually doubt that "fast enough" would be scalable in the first place. Travis ci_targets currently run ~20 jobs, launching ~4 in parallel. Azure DevOps claims that it runs 10 in parallel. So, given that we're growing in that direction, and every major language like C++ or C# yields runs in 3-6-10 environments, it is still a pain point.

there is not really a point for testing a bit more than required.

Definitely agree — given that majority of changes actually only change behavior for certain one language/target — it's a waste to rebuild/retest everything from scratch every time.

i. A CI pipeline to test and build the compiler and generate test code for all languages. This pipeline should also trigger all pipelines from ii. ii. A CI pipeline for every language to test the runtime and the generated test code.

Technically, this is (almost) how it is now. Ideally, the only difference I'd like to have is to have a step in between, to be able to trigger a pipeline on change in tests repo, while compiler remains the same.

  1. The runtime repositories currently do not tests against the generated ci_targets. Would it be and option to split the kaitai_struct_tests repo to the compiler and the runtime repos?

Sorry, I don't quite follow. Fresh runtime API repos are currently brought into the picture on every test run in ci_targets. What exactly would you want to split with kaitai_struct_tests?

  1. I'm okay with not storing artifacts in Azure, though I do not really like storing artifacts (and websites) in git repositories as well.

Storing code generated by the compiler in ci_targets actually makes a lot of sense to me. It is code, after all, and repos are to store code in the first place. It makes it very natural to do lots of operations, like tracing where exactly a certain change in code generation happened and linking that to changes in the compiler. Most other artifact storages are just dumb blob stores — i.e. you either don't have the access to previously built artifacts at all (after the retention has expired them), or you need to download them all separately and diff them manually.

Storing test run results in ci_artifacts is:

Unfortunately, I haven't yet seen a good solution for storing test results in a hosted manner for free. And as soon as "non-free" factor kicks in, it normally also means that we have to do some kind of authentication.

The way that ci_artifacts are architected today directly influences https://ci.kaitai.io/, so, at the very least I'd like to retain something similar if we're changing it.

Especially mixture of generated code and scripts like in https://github.com/kaitai-io/ci_targets feels weird.

I don't like this particular aspect either. One of the easy ways out would be probably moving these prepare-* scripts to tests, but the core problem is that most current CIs are incredibly repo-centric and require you to have something checked into the repo (like .travis.yml, or .pipelines/*, or .circleci/*) — and that thing would unfortunately also have some logic in it :( So, I don't see any good way around it...

cugu commented 4 years ago

Travis ci_targets currently run ~20 jobs, launching ~4 in parallel. Azure DevOps claims that it runs 10 in parallel. So, given that we're growing in that direction, and every major language like C++ or C# yields runs in 3-6-10 environments, it is still a pain point. Is the limitation of 10 parallel run per repository? Can find the info for Azure. For GitHub actions it is: https://help.github.com/en/github/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#usage-limits

If we would have the tests for e.g. ruby in the kaitai_struct_ruby_runtime instead of kaitai_struct_tests/spec/ruby we would have some benefits: