habitat-sh / core-plans

Core Habitat Plan definitions
130 stars 252 forks source link

(Tracking) Refresh base packages 2018-03 #1151

Closed fnichol closed 6 years ago

fnichol commented 6 years ago

The so-called "base plans" that I'm referring to is in the bin/build-base-plans.sh script. Any relevant issues and pull requests are getting placed into the Base Packages Refresh 2018.03 milestone.

fnichol commented 6 years ago

Okay, finally through the list! Now I'm going to rebase against master and drive out any changes that have been merged in the last 6 weeks or so. Then another full build run of these in a stage1 to make sure we're all really good.

fnichol commented 6 years ago

Rebase is good, repushed and going to update Habitat versions and rebuild the set in a stage1 Studio.

fnichol commented 6 years ago

Rebuilding the base packges above in a Stage1 Studio looked good, except for these Plans which download from SourceForge (currently experiencing a large service outage):

In the process I found a missing dependency of core/bats on core/hab-plan-build which is now in build-base-plans.sh.

The timings for the set were broken between the Plans that stopped (again, SourceForge-related) which I resolved by downloading the source on another system, putting the source in the Studio and continuing the program. Here are the raw timings:

Start -> pkg-config:
real    269m10.666s
user    524m8.489s
sys     45m11.665s

ncurses -> shadow
\real    2m0.602s
user    1m35.235s
sys     0m20.543s

psmisc -> psmic
real    0m5.838s
user    0m4.683s
sys     0m1.003s

procps-ng -> gdbm
real    16m31.664s
user    6m53.641s
sys     2m16.246s

expat -> findutils
real    19m55.995s
user    17m50.555s
sys     3m55.133s

xz -> util-linux
real    10m6.373s
user    9m45.170s
sys     5m58.384s

tcl -> tcl
real    1m50.839s
user    1m43.922s
sys     0m10.733s

expect -> wget
real    10m42.832s
user    3m47.270s
sys     0m36.734s

unzip -> libarchive-musl
real    8m45.865s
user    9m1.078s
sys     1m23.558s
(Note: why did libarchive-musl stop that loop? dunno)

rust -> hab
real    6m7.727s
user    29m46.053s
sys     0m30.719s

bats -> libbsd
real    1m36.106s
user    1m26.842s
sys     0m9.145s

clens -> hab-studio
real    0m11.036s
user    0m8.200s
sys     0m1.057s

This might look long, but keep in mind that the DO_CHECK environment variable was set, meaning that all the do_check() build phases were triggered and were successful.

fnichol commented 6 years ago

I'm going to run one more test run against a current master checkout of habitat-sh/habitat as the build program was not 100% up-to-date from the last attempt and was checked out to Habitat circa ~0.52.0.

Still left to do is update Plans that are failing the PR linting to ensure that they are brought up to date with our standards, so expect at least a few more rebase/pushes.

fnichol commented 6 years ago

The rebuild in a stage1 Studio went well last week/weekend and went almost all the way through without stopping. In order to work around SourceForge outage issues, I dropped the source tarballs for the packages above in the src cache before running the build (thus skipping those downloads but still verifying the sources).

Now for passing the linting on some plans that haven't been updated since I wrote some of the originals.

fnichol commented 6 years ago

Now all the plans are passing linting and appear (hopefully) much more consistent with the other base plans. The branch has been rebased within itself so that the commit order is a combination of the plan build order and which changes were made first (i.e. version bumps, linting updates, etc). Note that some have 2 or even 3 version bumps which I attempted to maintain for historical and git-bisecting reasons, as well as to preserve author history.

I'm going to rebase this branch again against current master to make sure it still is merge-clean.

fnichol commented 6 years ago

Rebase went okay. Will try (most likely) one last stage1 build to ensure that nothing regressed after all the linting fixes. I did manage to break bzip2 for a few hours from a lint fix, so you never know…

fnichol commented 6 years ago

Now I'm looking at a final freshening of several foundational packages, namely glibc, binutils, and gcc. The first thing I've found is that glibc 2.27 now requires bison to build which requires building a new stage1 tarball from using the Linux From Scratch project as before (the current version does not have bison and therefore has insufficient dependencies). Sometimes one small version bump involves a lot of work--this is really hard to pre-determine.

fnichol commented 6 years ago

Here are the remaining Plan updates that I skipped last pass as I suspected they would be larger and more focused. The above comment alluded to updating the stage1 tarball which is directly related to the Glibc/Binutils/GCC gang.

fnichol commented 6 years ago

Looking good with these updates, now will rebase these updates into the branch and finally rebase against current master.

fnichol commented 6 years ago

Rebasing complete, now running a last stage1 regression build…

fnichol commented 6 years ago

I found another failing test in procps-ng and a glibc-2.27 fix required in make. The branch is rebased and updated as a result of that last stage1 run.

As I'm using an updated stage1 tarball that only I had to date, I published it (i.e. uploaded it to our s3 bucket) and updated the stage1 logic in the Studio codebase so that others will be able to replicate this work in habitat-sh/habitat#4766 (should ship in the next Habitat release most likely today).

fnichol commented 6 years ago

Well, it's been a week. The stage1 builds have gone great, but I was unable to use the stage1 artifacts to enter a new default-type Studio without the need of an intermediate Depot/Builder API.

Instead I managed to refactor and update the install logic used by hab pkg install so that I could use a local core/hab-backline package locally that wasn't yet uploaded (instructions to follow), which was the basis of habitat-sh/habitat#4771.

Once I was able to start a "stage2" build, the binutils package failed. After some digging in there, it turns out we needed to empty up the LDFLAGS environment variable a bit, just like the tweaks that were needed for C*FLAGS variables (rebased branch inbound soon).

Running the full base set in a second stage Studio with full test suite should be more than enough to prove out any issues or differences related to stage1 vs. default building (this mostly related to pkg_build_deps() differences).

fnichol commented 6 years ago

Also related to this work is an update to the Bootstrapping Habitat docs page which will form the basis of the testing steps another person could take to verify this work. As I was revisiting this workflow, I found that more and more environment variables were needed inside and outside the Studios to correctly prepare the building set, so I set about folding some of these steps into short scripts that the bootstrapping instructions can run. I'll be including those in this PR as well for completeness.

fnichol commented 6 years ago

The first run through a stage2 build was finally successful. I needed further updates and fixes to binutils, procps-ng, and bc and these are now in the current branch which is rebased against current master.

Next up is a full test run of a stage2 build to ensure that nothing else is missed. As this turns a couple of hours task into a six-hour task, I thought it prudent to go wide and shallow first before going deep and wide.

I'm also in a place where I should be able to test a build in a current default Studio using our existing software to build against. I'm still not sure what to expect here, but would like to know if this is possible. If it is, Builder can help us with some of the base plans building. If not, then we're back to a local Studio build of the set which will form the basis of a new package set. Anyway, updates to follow.

fnichol commented 6 years ago

Okay, the stage2 build with tests is solid. Hopefully one last rebase of this branch and it's review-ready.

The bummer news is that when I tried a build using the default Studio, it quickly failed to build glibc. I suspect this is because we have pretty old software trying to build the most modern equivalents and there are edge cases that would need to be considered.

As a result, this means we'll most likely need to build these base plans in a Studio out-of-band from Builder, upload them and build from there.

fnichol commented 6 years ago

Hey guess what? I think we're finally ready to move this work forward!

tenor-225762689

tenor-26453405

fnichol commented 6 years ago

Okay, so what if you wanted to try and build this base set and verify that it works? Or to put things another way: how did I build and test this set?

Related to this work is an update to the Internals/Bootstrapping Habitat page on our docs site. The PR with this update is at habitat-sh/habitat#4829 and you can take an early look on our acceptance website (please mind the not-correct SSL certificate warning). I'm not 100% happy with the rendering of some of the code snippets so if you're copy/pasting the steps to follow, it might be safer to do so from the git source rather than the docs page itself.

I'll add exactly which steps to follow and which to change in this issue in a minute…we're merging some work in habitat-sh/habitat that's related and may affect what you clone and checkout.

fnichol commented 6 years ago

Testing Instructions and Notes

tenor-107553074

The setup, steps, explanations, etc. are going to shortly be at https://www.habitat.sh/docs/internals/#bootstrap-internals, but until then, we're going to use the version in the PR branch for habitat-sh/habitat#4829 which is:

https://github.com/habitat-sh/habitat/blob/fnichol/update-bootstrapping/www/source/partials/docs/_internals-bootstrapping.html.md.erb

You'll most likely want to get a cloud instance with good compute and reasonable storage otherwise it takes a lot longer to build.

Start with the Part III: Preparing to build section. However, since we're testing this branch and we need a fix from the habitat-sh/habitat repo, you can run this instead:

$ mkdir habitat-sh
$ cd habitat-sh
$ git clone https://github.com/habitat-sh/habitat.git
$ (cd habitat && git checkout fnichol/studio-new-cleanup)
$ git clone https://github.com/habitat-sh/core-plans.git
$ (cd core-plans && git checkout fnichol/teh-futur)

If you're a core maintainer with a legit. core origin secret key, then you can install it on your host (you can use hab origin key import and paste the secret key in--make sure you're the non-root user). Otherwise you can generate a throwaway core key like the docs page suggests--if you aren't going to upload these to a real Builder API then it shouldn't matter either way.

The Part VI: Remaining packages in world section is still work-in-progress, where you're going to build all other non-base packages if you want to see that--just know that we're talking 14+ hours to do this in serial. Having said that, we'd likely use some form of this to see which Plans need an update or fix. In fact, I'm going to resume this myself and try to submit standalone PRs in core-plans to fix these as they come up (if one of these PRs works for current packages and new packages we can merge them earlier with super low risk).

It's possible that you might run low or out of disk, so it's handy to know that we're keeping all the Studio's root filesystems around with these instructions. If you need to reclaim space, then head to the Part VIII: Cleaning up section and kill some Studios!

nellshamrell commented 6 years ago

Got an error when running ./core-plans/bin/bootstrap/stage1-build-base-plans.sh

Here is the error:

      cp/cp-lang.o c-family/stub-objc.o cp/call.o cp/decl.o cp/expr.o cp/pt.o cp/typeck2.o cp/class.o cp/decl2.o cp/error.o cp/lex.o cp/parser.o cp/ptree.o cp/rtti.o cp/typeck.o cp/cvt.o cp/except.o cp/friend.o cp/init.o cp/method.o cp/search.o cp/semantics.o cp/tree.o cp/repo.o cp/dump.o cp/optimize.o cp/mangle.o cp/cp-objcp-common.o cp/name-lookup.o cp/cxx-pretty-print.o cp/cp-cilkplus.o cp/cp-gimplify.o cp/cp-array-notation.o cp/lambda.o cp/vtable-class-hierarchy.o cp/constexpr.o cp/cp-ubsan.o cp/constraint.o cp/logic.o attribs.o incpath.o c-family/c-common.o c-family/c-cppbuiltin.o c-family/c-dump.o c-family/c-format.o c-family/c-gimplify.o c-family/c-indentation.o c-family/c-lex.o c-family/c-omp.o c-family/c-opts.o c-family/c-pch.o c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o c-family/c-semantics.o c-family/c-ada-spec.o c-family/c-cilkplus.o c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o c-family/c-attribs.o c-family/c-warn.o i386-c.o glibc-c.o cc1plus-checksum.o libbackend.a main.o libcommon-target.a libcommon.a ../libcpp/libcpp.a ../libdecnumber/libdecnumber.a libcommon.a ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a ../libiberty/libiberty.a ../libdecnumber/libdecnumber.a   -L/hab/pkgs/core/gmp/6.1.2/20180402211659/lib -L/hab/pkgs/core/mpfr/4.0.1/20180402211719/lib -L/hab/pkgs/core/libmpc/1.1.0/20180402211736/lib -lmpc -lmpfr -lgmp -rdynamic -ldl  -lz
collect2: error: ld returned 1 exit status
make[3]: *** [../../gcc-7.3.0/gcc/c/Make-lang.in:85: cc1] Error 1
make[3]: *** Waiting for unfinished jobs....
collect2: error: ld returned 1 exit status
make[3]: *** [../../gcc-7.3.0/gcc/lto/Make-lang.in:81: lto1] Error 1
rm gfortran.pod gcc.pod
make[3]: Leaving directory '/hab/cache/src/gcc-build/gcc'
make[2]: *** [Makefile:4706: all-stageprofile-gcc] Error 2
make[2]: Leaving directory '/hab/cache/src/gcc-build'
make[1]: *** [Makefile:23870: stageprofile-bubble] Error 2
make[1]: Leaving directory '/hab/cache/src/gcc-build'
make: *** [Makefile:24007: profiledbootstrap] Error 2
   gcc: Build time: 23m58s
   gcc: Exiting on error

build-plans.sh run time: 44m26s

Exiting on error
fnichol commented 6 years ago

Okay, after some sleuthing and trying to reproduce this, I found that…I ran out of disk space on my compute instance. I think this'll be a good lesson: you want to have a reasonable amount of disk to perform this work--at the moment we're both trying with ~120GB root disk and will report back.

rsertelon commented 6 years ago

I've successfully built stage1 and stage2 without problems. I'm currently building stage3 (world), it started fine, and I'm confident it'll get to the end ;)

The documentation steps were really clear, I didn't dug into the scripts, just executed them, it was a nice experience doing this. Only one small glitch found in the doc, added a comment on the PR.

LGTM ;) :tada:

Edit: Actually, gdb failed with a compilation error:

location.c:527:19: error: ISO C++ forbids comparison between pointer and integer [-fpermissive]
       || *argp == '\0'
nellshamrell commented 6 years ago

Looks like @fnichol's studio work was merged into the master branch of Habitat, here are the revised instructions for get into the stage-1 studio:

1) Spin up c4.4xlarge instance on AWS with 120 GB disk
2) Start tmux
3) Create new tmux session
$ tmux new -s base_plans
4) Install Habitat
$ curl https://raw.githubusercontent.com/habitat-sh/habitat/master/components/hab/install.sh | sudo bash
Import current secret core key with
$ hab origin key import
$ mkdir habitat-sh
$ cd habitat-sh
$ git clone https://github.com/habitat-sh/habitat.git
$ git clone https://github.com/habitat-sh/core-plans.git
$ (cd core-plans && git checkout fnichol/teh-futur)
$ ./core-plans/bin/bootstrap/stage1-studio.sh enter
fnichol commented 6 years ago

Just rebased the branch against current master.

fnichol commented 6 years ago

Now that #1210 I'm going to close this out.

PARTY TIME!