kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
3.96k stars 192 forks source link

Better CI: modular build system #62

Closed GreyCat closed 5 years ago

GreyCat commented 7 years ago

I have yet another huge, but pretty helpful proposal. The proposal is to get ourselves a better CI.

Problems with current CI

There is one main huge problem: it's monolithic:

  1. Check out everything
  2. Build compiler
  3. Build test .ksy files => target languages code
  4. For every target language: 4.1. (If it's compilable language) Build: 4.1.1. Compiled target languages code from tests 4.1.2. KS runtime 4.1.3. Actual test specs (i.e. stuff with "assert equals") + test runner (sometimes) 4.2. Run tests (doing assertions), generate some sort of report
  5. Aggregate all the reports, generate CI report page
  6. Upload and update CI report page on our website

This leads us to:

etc, etc. So, bottom line: monolithic = bad, modular = good.

KOLANICH commented 7 years ago

63

GreyCat commented 7 years ago

@KOLANICH Sorry, I don't understand you. Right now I've just described current state of things, there are no "build packages" right now. Besides, GitHub "Releases" stuff is not based upon uploads anyway — they're generated automatically from repo tags and are source-only.

LogicAndTrick commented 7 years ago

FYI you can attach binaries to a release by editing a tag in the releases page.

KOLANICH commented 7 years ago

@KOLANICH Sorry, I don't understand you. Right now I've just described currente state of things, there are no "build packages" right now.

I was a bit wrong. I have created an another issue #63, but it is closely related to this one, because you can build the modules separately (with every module having own travis build script), and then fetch the results from Releases pages and reuse them. There are some problems with module dependencies, but it can be solved by putting dependencies description into a separate repo. 1 travis build script fetches dependencies repo 2 travis build script builds and tests its targets 3 if there were no errors on the previous step it builds the packages and uploads them 4 It makes a dummy push into every repo dependent from the built repo to make them to be rebuilt and retested with Travis.

FYI you can attach binaries to a release by editing a tag in the releases page.

I propose to do it automatically on every successful Travis build.

GreyCat commented 7 years ago

because you can build the modules separately (with every module having own travis build script), and then fetch the results from Releases pages and reuse them.

Sorry, I don't quite understand almost everything you're mentioning in this paragraph. What are the "modules" and "dependencies" you're talking about? Why is that a problem in the first place?

FYI you can attach binaries to a release by editing a tag in the releases page. I propose to do it automatically on every successful Travis build.

This is pretty much pointless. Travis does mostly unstable builds and releases are for stable (tagged) builds. While it is possible to attach only "tagged" build files to releases, it is probably pointless anyway, as lots of release artifacts (.deb repo files, Ruby .gem, Python packages, etc), must be published in designated places, and we already do all that.

KOLANICH commented 7 years ago

Travis does mostly unstable builds and releases are for stable

GH Releases are for whatever the repo owner wants.

Why is that a problem in the first place?

The problem stated in the first post in this issue. The solution is to divide ks compiler from ks runtimes and put them into separate repos and biuld and test them separately.

What are the "modules" and "dependencies" you're talking about?

So the module is a separate git repo with a standalone part of ks, dependencies is what depends on what. Runtime library depends on compiler - if the compiler changes its interface, the runtime library also is needed it be changed. So every compiler change requires to run tests for every runtime library using updated version of the compiler. We don't want to store this data in compiler repo so we should create a separate repo for dependecy description. When a runtime updates you only need to recheck that runtime. In this case you can take prebuilt and tested compiler binary and use runtime with it without retesting compiler.

ghost commented 7 years ago

I have found an interesting example - .travis.yml for the ANTLR project. As Kaitai Struct the ANTLR project has runtime libraries for different languages (C#, C++, Go, Java, JavaScript, Python 2 and 3, Swift). ANTLR tool generates parsers and lexers, and then the generated parsers and lexers use the runtime libraries (the same principle as in Kaitai Struct). The .travis.yml calls scripts from the .travis directory. May be it may help somehow.

GreyCat commented 7 years ago

The time has (kind of) come: given that we'll need pretty sophisticated system to test writes (for #27), I've decided to take a few first steps.

Initially I had this idea of the workflow:

CI flow graph

I've started from running the actual tests:

To add new languages, the following is needed:

The output is not saved anywhere so far. The next step is, obviously, publishing test artifacts, i.e. JUnit XML-style or whatever reports they can provide.

Even this PoC Travis run uncovered a few problems with our current build:

GreyCat commented 7 years ago

Tried to get Appveyor to build C++ using MSVC compiler: https://ci.appveyor.com/project/GreyCat/ci-targets

Wow, how naive I am. Right now it fails to run due to Boost (and Boost.Test) + zlib being unavailable. Is there a simple way to install boost / zlib on Windows?

@LogicAndTrick Probably running C# on several .NET platform on Windows would be possible now — wanna take a look? I can add you to Appveyor account.

LogicAndTrick commented 7 years ago

Sure, I'll see if I can get something running when I have time.

GreyCat commented 7 years ago

@LogicAndTrick I've tried to add you by e-mail. Hopefully you'll receive some invitation or something?..

LogicAndTrick commented 7 years ago

Looks like a few versions of Boost are installed in the AppVeyor image: https://www.appveyor.com/docs/build-environment/#boost You might need to set up an environment variable to point to one of those paths. I don't know about zlib though.

koczkatamas commented 7 years ago

Judging from appveyor config files on the internet ( eg. https://github.com/libgd/libgd/blob/master/appveyor.yml ) we may have to install zlib manually.

(It's a bit weird though a lot of projects are using zlib and it does not change that much, so you don't have to keep N versions. Maybe it worths to ask the appveyor guys to put it into the base image?)

GreyCat commented 7 years ago

I guess zlib is not that big of an issue (it's very small anyway), and, besides, we might want to test it on Windows with zlib disabled.

LogicAndTrick commented 7 years ago

Alright I've been experimenting with this and my scripts are not very good but they kind of work. Is this enough to start you off or do you need more info? I'm not really confident with this stuff so I'm probably doing some things wrong:

GreyCat commented 7 years ago

@LogicAndTrick Thanks for all that investigation, it will certainly help!

appveyor.yml file - This seems like a better way to manage the AV script, similar to Travis

run-csharp-dotnet-framework - uses Microsoft's msbuild and csc tools run-csharp-mono - uses Mono's xbuild and mcs tools

Cool :) The only thing probably worth moving to prepare-* scripts is nuget restore ... stuff, as it is technically an initialization, not test run.

My idea is that run-* scripts should be perfectly usable on normal developers' boxes, not only on CI servers. If it needs any per-installation configuration, we can always do it in something like a config file. "Normal" (i.e. usable by a developer) installation will use one config and CI run will just use another one (for example, to reference specific paths in AV images).

C++ will run too (but doesn't work right now because of the missing cd tests in the install script)

I actually doubt that. Your version doesn't differ much from what I've launched, and it fails, being unable to find Boost and Boost.Test in CMake setup.

I assume you will want to add something to publish these results (AppVeyor artifact maybe?)

Yeah, it's the common next step for all CIs (both Travis and AppVeyor). I was thinking of two obvious choices:

Then yet another Travis job should trigger, pick up these artifacts and aggregate them to update CI page. Both these choices are actually pretty messy :( Registering yet another dozen of repositories just for the sake of storing test results feels lot like abuse of GitHub (and it's tons of work too). BinTray uses an extremely complex API, both to publish and retrieve files, which is a major turnoff for me.

Any other ideas?

LogicAndTrick commented 7 years ago

Could you use one repo with a branch for each target? A little messy, but it means you don't have to have a separate repo for each language. As for the reference paths, maybe some environment variables? e.g. MONO_INSTALL_DIRECTORY or something?

GreyCat commented 7 years ago

I've tried to do Bintray upload, and, after some experimenting, I'm tempted to say that it's mostly useless for these purposes: https://travis-ci.org/kaitai-io/ci_targets/jobs/220439314

Could you use one repo with a branch for each target?

Yeah, I think that should work! I'll try it next.

As for the reference paths, maybe some environment variables?

Yeah, exactly :) Basically, that's what these config files are doing.

LogicAndTrick commented 7 years ago

I guess in this case these are variables that could change depending on the user's setup. Is that still okay to put in the config file, and expect the user the modify it if they need to? Right now the config variables are well-known (relative) paths, so they don't ever need to be changed.

I was thinking something like this (pseudocode):

# User modifies these if they want
MONO_INSTALL_LOCATION=/c/Program Files (x86)/Mono/bin/;
MSBUILD_INSTALL_LOCATION=/c/Program Files (x86)/MSBuild/14.0/Bin/;

PATH = $PATH + $MONO_INSTALL_LOCATION + $MSBUILD_INSTALL_LOCATION
# Scripts reference xbuild/msbuild/etc from the path
# If they are on the path already they'll "just work" even if the install locations are different from above

It feels a bit flimsy, but is there another way to do it? I don't think AppVeyor environment variables (Windows) will flow through to the MSYS environment so I'm not sure if there's a way to do it via the AV config.

GreyCat commented 6 years ago

I've got to do another approach on this issue, and I found out that, actually, there's a whole world of different CIs out there which support modular workflows/pipelines.

We have about a dozen or so of repositories, and they all should be built, tested and deployed in a complex manner. This implies an intersting difference: it would be highly beneficial for us to have a CI configuration not stored in a repository (along the with code), akin to .travis.yml file, but instead set up externally.

When orchestrating a complex flow/pipeline, there are a couple of key questions:

I'm currently checking out:

Self-hosted:

Things I've checked out and these probably not satisfy the criterias outlined above:

Ideally, I'd still like to stick to hosted infrastructure that someone else would support. But, if all else fails, I'm probably ok with hosting our own CI at some sort of generic server(s).

arekbulski commented 6 years ago

There is a drawback, you need to build compiler and example schemas on each CI serve instead of once. So each build gets shorter but addup to more in sum total.

GreyCat commented 6 years ago

Um, you've commented on some sort of earliers plans?

GreyCat commented 5 years ago

Ok, returning back to this, this time trying to complete it.

What's done already

What's left to be done

Anyone can lend a help with HTML+JS (Vue, JQuery, whatever you prefer) here?

GreyCat commented 5 years ago

Ok, a very rough new CI page, aggregating everything is implemented as http://kaitai.io/ci/ci.html — and we already support quite a few new & old combinations. Please take a look and tell me what you think of it.

Obviously, missing stuff is:

app.__vue__.gridColumns = ["name", "cpp_stl_11/gcc4.8_linux", "cpp_stl_11/clang7.3_osx", "cpp_stl_11/clang3.5_linux", "ruby/2.3"]

If anyone proficient in (or willing to learn) Vue wants to help, I'd be most grateful ;)

GreyCat commented 5 years ago

Language convesion to new CI checklist for me to track:

GreyCat commented 5 years ago

csharp/mono5.18.0 and lua/5.3 was ported to new CI system this morning. This also paves the way to do more well-round testing for C# with other systems (i.e. on Windows, .NET core, .NET standard, regular .NET, etc).

Unfortunately, we'll be most likely dropping Construct support eventually, as the project itself seems to be abandoned :(

Need to double-check what's going on with go, and, phew, this looks like it's almost done.

GreyCat commented 5 years ago

Ok, go has successfully joined the company. Which cosmetically it's still clumsy, I guess we can consider this task done.