lavabit / robox

The tools needed to robotically create/configure/provision a large number of operating systems, for a variety of hypervisors, using packer.
620 stars 139 forks source link

Some fixes for Lineage #212

Closed timschumi closed 2 years ago

timschumi commented 3 years ago
timschumi commented 2 years ago

@ladar Do you have any opinions on the "for consideration" TODOs (and maybe some background info on why git was built from source)? Also, was passing through USB to the VM ever an intended use-case (judging by the udev rules and adb installation)?

ladar commented 2 years ago

@timschumi this pull request looks pretty good. Thank you for taking the time to make it happen.

some background info on why git was built from source)?

When I created this build environment, way back in late 2015, early 2016 there was bug that would cause the repo sync process to randomly fail. Which is incredibly frustrating, because I'd kick it off, and check on it a few hours later, and realize it had failed in one of the very early stages.

As it turns out the bug was a problem with GNUTLS and the custom TLS implementation used by Google servers. It would cause connections to sporadically fail. And of course Ubuntu shipped with the git linked against GNUTLS instead of OpenSSL. So the recompile was to eliminate these random failures. I do not know this is still is a problem, which is why I never removed the recompile logic, despite it taking quite a bit of time during the box build process. It might be nice to confirm the fix before we remove that.

Also, was passing through USB to the VM ever an intended use-case (judging by the udev rules and adb installation)?

The udev rules were added in part because a) they were recommended by the setup docs for all droid developers, so they can connect test devices without having to use sudo, and b) because I thought some folks might need to hook up their devices to the VM during the build process so the blobs could be extracted directly from the device, which was the only officially supported method of getting the blobs back then. I don't know if that has changed at all. And since hooking up a device breaks the automation, I uploaded the blobs for my device ahead of time, so it could run automatically.

I haven't kept up with this box as much the last year or so. My Photon Q died, and I decided not to swap it out with one of my spares. I initially purchased the Z Force phone (aka nash) and planned to pair it with the keyboard moto mod but the mod never shipped, so I never made much use of it. I want to switch to an Fxtec Pro1 but T-mobile won't activate it for me (because I was a Sprint customer and haven't switched to any of the more expensive T-mobile plans). Apparently the Pro1 never got added to Sprint's list of approved IMEIs. I'm still waiting on that issue, but it's left in me limbo. I plan on updating the config/default target to build Lineage ROMs for the Pro1, if I'm ever able to switch.

Also, as an FYI, my original plan back in 2016 was to create specific box variants, with each targeting a different device. That way the blobs could be embedded in the box image, and it would be much easier for amateurs to create their own ROMs. Unfortunately I ended up never having the time to collect blobs for different devices, let alone test the build process for other devices, and nobody else seemed interested in taking the lead there, so we ended up with how it is now.

ladar commented 2 years ago

@timschumi I didn't see any problems with the changes, and I'm ready to merge. How well have you tested them? I'm gonna try and build boxes and then test the lineage build scripts myself. In theory they'll get uploaded as version 3.5.3 (the 3.5.4 upload should be about ready, hence 3.5.3) but it will take awhile, and there are no guarantees I'll finish. Hence the question.

Alternatively, is there anyone else who can possibly review these changes? @danielhoherd are you still using the Lineage boxes, and can you review/test the changes?

danielhoherd commented 2 years ago

@ladar i wasn't ever able to get this working, but IIRC i did have a lead on how i could possible make it work. i'll give it a go and see if i make any progress.

timschumi commented 2 years ago

It might be nice to confirm the fix before we remove that.

I haven't noticed anything when testing my changes, but unless this is a VM-specific issue, this should have been fixed for quite some time (and even then, many people do build on Ubuntu in a VM, so I think we would have heard of that by now).

some folks might need to hook up their devices to the VM during the build process so the blobs could be extracted directly from the device, which was the only officially supported method of getting the blobs back then. I don't know if that has changed at all.

It is still the officially endorsed method for obtaining blobs, although most devices should support extracting blobs from a unpacked installation ZIP by now. However, that requires quite a bit of setup in regards to mounting the images, and we might not always have that ZIP available for automatic download from the downloads server.

In any case, the one-fits-all blob archive that is currently in use definitely doesn't work, so the solution that is proposed in the PR is probably the next best solution for obtaining blobs (and, as far as I know, should work for every device that was ever supported without any maintenance burden on your side).

I plan on updating the config/default target to build Lineage ROMs for the Pro1, if I'm ever able to switch. Also, as an FYI, my original plan back in 2016 was to create specific box variants, with each targeting a different device.

In regards to the default config, I made the decision to remove the second box, since we would otherwise generate and upload two 250GB boxes for a three-line change. Now it drops you into a shell, shows you an example invocation of the build script in the motd, and stores the xt897-compatible settings as a default in case the user doesn't overwrite those.

How well have you tested them?

I ran a few test builds for my own device (because I'm sure that it isn't in a defunct state) on the newest LineageOS version to test the changes that I made and to confirm the necessary disk size, other stuff for older branches (mainly file name and ccache compatibility) I have changed purely by looking at it, without running test builds afterwards (this also means that the default configs are currently untested, but I'm reasonably confident that I haven't broken anything there).

danielhoherd commented 2 years ago

It looks like maybe there's a bug in the build?

$ packer build lineage-virtualbox.json
...lots of stuff...
Build 'lineage-virtualbox' finished after 35 minutes 59 seconds.

==> Wait completed after 35 minutes 59 seconds

==> Builds finished. The artifacts of successful builds are:
--> lineage-virtualbox: 'virtualbox' provider box: output/lineage-virtualbox-.box
--> lineage-virtualbox: Created artifact from files: output/lineage-virtualbox-.box, output/lineage-virtualbox-.box.sha256
--> lineage-virtualbox: 'virtualbox' provider box: output/lineageos-virtualbox-.box
--> lineage-virtualbox: Created artifact from files: output/lineageos-virtualbox-.box, output/lineageos-virtualbox-.box.sha256

Is the filename supposed to have a -.box or is some interpolated variable missing its value?

https://github.com/timschumi/robox/blob/3023206d6d/lineage-virtualbox.json#L137

My build env is:

Edit: looks like this comes from the robox.sh script, which I did not run.

ladar commented 2 years ago

Is the filename supposed to have a -.box or is some interpolated variable missing its value?

@danielhoherd this one's easy. Yes, it's missing the "VERSION" environment variable. Generally speaking, you should run:

./robox.sh lineage-virtualbox

Combining the org (in this case lineage) together with the provider (in this case virtualbox) to build all the boxes for a given org+provider. Likewise to build all the lineage boxes, run:

./robox.sh lineage

Or all for a provider:

./robox.sh virtualbox

To build a specific box, you should use:

./robox.sh box lineage-virtualbox

This takes a comma separated list of box names, so you could also use it like so:

./robox.sh box lineage-virtualbox,lineage-libvirt,generic-ubuntu1604-virtualbox,generic-ubuntu1604-libvirt

Running the robox script without any arguments tries to convey all the above, plus some of the other functionality available, but it's only really clear if you happen to be an expert in CLI help syntax. In theory what I just wrote, in a more generic fashion should be added to the README, but haven't found the time yet.

Doing it this way sets all the requisite environment variables needed. It also uses the .credentialsrc file, if one is present in the directory. Otherwise it creates a stub version of that file you can use to view edit. Inside that file you can override the version string. The stub template sets the version to 1.0.0 since someone forking the project would presumably start with version 1. You can remove that line, and the script will use the version string embedded in the robox script (which I update with each release). If you remove that line you can also override the version string from the command line like so:

export VERSION="100.200.300" ; ./robox.sh lineage-virtualbox

I should probably update stub template so it doesn't override environment variables, like the robox script at some point.

Also technical note for you @danielhoherd since I just noticed your using MacOS. The robox.sh assumes in several places that if you're on MacOS, you want to build Parallels boxes. Likewise on Windows it assumes you want Hyper-V. I don't recall if simply includes Parallels, plus the others on MacOS, or excludes everything but Parallels, but it's something to be aware of if you run into issues.

ladar commented 2 years ago

@danielhoherd I should probably also mention that the vagrant-libvirt plugin still has a number of issues running under MacOS that require special tweaks... to workaround. I assume you're aware?

ladar commented 2 years ago

The 3.5.3 images with the changes proposed by @timschumi are available. My first compilation attempt failed at 38%. I don't know it was because the xt897 device is no longer supported, or something else, like the typos caught by @danielhoherd. Gonna try again, and capture a log this time.

In the meantime, @timschumi can you suggest a build target that you know should work?

timschumi commented 2 years ago

VENDOR=oneplus DEVICE=oneplus3 BRANCH=lineage-18.1 worked the last time that I worked on these patches. I'll fire off a xt897 build in the meantime.

ladar commented 2 years ago

@timschumi it failed at a different spot this time, with an error about the Jack server. So it could be that something isn't being prepped right. I'll see if I tweak build script, using the changes above, and see what happens. And I'll try the device you suggested as well.

[ 43% 14927/34632] Ensure Jack server is installed and started
FAILED: /bin/bash -c "(prebuilts/sdk/tools/jack-admin install-server prebuilts/sdk/tools/jack-launcher.jar prebuilts/sdk/tools/jack-server-4.8.ALPHA.jar  2>&1 || (exit 0) ) && (JACK_SERVER_VM_ARGUMENTS=\"-Dfile.encoding=UTF-8 -XX:+TieredCompilation -Xmx8G\" prebuilts/sdk/tools/jack-admin start-server 2>&1 || exit 0 ) && (prebuilts/sdk/tools/jack-admin update server prebuilts/sdk/tools/jack-server-4.8.ALPHA.jar 4.8.ALPHA 2>&1 || exit 0 ) && (prebuilts/sdk/tools/jack-admin update jack prebuilts/sdk/tools/jacks/jack-2.28.RELEASE.jar 2.28.RELEASE || exit 47; prebuilts/sdk/tools/jack-admin update jack prebuilts/sdk/tools/jacks/jack-3.36.CANDIDATE.jar 3.36.CANDIDATE || exit 47; prebuilts/sdk/tools/jack-admin update jack prebuilts/sdk/tools/jacks/jack-4.7.BETA.jar 4.7.BETA || exit 47 )"
Writing client settings in /home/vagrant/.jack-settings
Installing jack server in "/home/vagrant/.jack-server"

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /home/vagrant/.jack-server/server.jks -destkeystore /home/vagrant/.jack-server/server.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /home/vagrant/.jack-server/client.jks -destkeystore /home/vagrant/.jack-server/client.jks -deststoretype pkcs12".
Launching Jack server java -XX:MaxJavaStackTraceDepth=-1 -Djava.io.tmpdir=/home/vagrant/temp -Dfile.encoding=UTF-8 -XX:+TieredCompilation -Xmx8G -cp /home/vagrant/.jack-server/launcher.jar com.android.jack.launcher.ServerLauncher
Jack server failed to (re)start, try 'jack-diagnose' or see Jack server log
SSL error when connecting to the Jack server. Try 'jack-diagnose'
SSL error when connecting to the Jack server. Try 'jack-diagnose'
[ 43% 14927/34632] target thumb C++: libicui18n <= external/icu/icu4c/source/i18n/rematch.cpp
ninja: build stopped: subcommand failed.
build/core/ninja.mk:151: recipe for target 'ninja_wrapper' failed
make: *** [ninja_wrapper] Error 1
make: Leaving directory '/home/vagrant/android/lineage'

#### make failed to build some targets (01:19:24 (hh:mm:ss)) ####
timschumi commented 2 years ago

Ah, this is a known issue. In a recent update Oracle disabled TLSv1 support for Java 8, but Jack relies on those to be enabled. I'll push a fix for that shortly after I confirmed that it works.

ladar commented 2 years ago

I took a closer look at the suggestions above. I thought they were adding backslashes to escape $ characters, etc. But that isn't the issue.

timschumi commented 2 years ago

The suggestions are just replacing the old way of command substitution (backticks) with the newer variant ($(...)). They are functionally equivalent, but the newer one is easier to handle due to them being clearly nestable.

EDIT: The other suggestion removes the manual counting of cpuinfo with a predefined command, but both of those are also functionally equivalent.

ladar commented 2 years ago

@timschumi I realized they were functionally equivalent when I looked closer. I wrote that script 6 years ago. I've gotten a little bit better since then, so the cleanup is certainly welcomed.

I tried the params you suggested, just to see what would happen and got this:

vagrant@lineage:~$ VENDOR=oneplus DEVICE=oneplus3 BRANCH=lineage-18 ./lineage-build.sh 
DEVICE=oneplus3
BRANCH=lineage-18
VENDOR=oneplus

NAME=Ladar Levison
EMAIL=ladar@lavabit.com

Override the above environment variables to alter the build configuration.

Downloading Repo source from https://gerrit.googlesource.com/git-repo
repo: Updating release signing keys to keyset ver 2.3
Downloading manifest from https://github.com/LineageOS/android.git
fatal: Couldn't find remote ref refs/heads/lineage-18
manifests: sleeping 4.0 seconds before retrying
fatal: Couldn't find remote ref refs/heads/lineage-18
fatal: cannot obtain manifest https://github.com/LineageOS/android.git
Traceback (most recent call last):
  File "/home/vagrant/android/lineage/.repo/repo/main.py", line 651, in <module>
    _Main(sys.argv[1:])
  File "/home/vagrant/android/lineage/.repo/repo/main.py", line 627, in _Main
    result = run()
  File "/home/vagrant/android/lineage/.repo/repo/main.py", line 620, in <lambda>
    run = lambda: repo._Run(name, gopts, argv) or 0
  File "/home/vagrant/android/lineage/.repo/repo/main.py", line 286, in _Run
    result = cmd.Execute(copts, cargs)
  File "/home/vagrant/android/lineage/.repo/repo/subcmds/sync.py", line 982, in Execute
    self._UpdateManifestProject(opt, mp, manifest_name)
  File "/home/vagrant/android/lineage/.repo/repo/subcmds/sync.py", line 896, in _UpdateManifestProject
    partial_clone_exclude=self.manifest.PartialCloneExclude)
  File "/home/vagrant/android/lineage/.repo/repo/project.py", line 1139, in Sync_NetworkHalf
    and self._ApplyCloneBundle(initial=is_new, quiet=quiet, verbose=verbose)):
  File "/home/vagrant/android/lineage/.repo/repo/project.py", line 2279, in _ApplyCloneBundle
    bundle_url = remote.url + '/clone.bundle'
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
./lineage-build.sh: line 71: build/envsetup.sh: No such file or directory
sed: can't read /home/vagrant/android/lineage/build/tools/releasetools/common.py: No such file or directory
./lineage-build.sh: line 77: breakfast: command not found
timschumi commented 2 years ago

The .1 at the end of the branch name is important, notice the fatal: Couldn't find remote ref refs/heads/lineage-18.

EDIT: Also, we should really make the script exit on error instead of mindlessly continuing to run.

ladar commented 2 years ago

Gotcha, my fault. It was a cut and paste error.

ladar commented 2 years ago

EDIT: Also, we should really make the script exit on error instead of mindlessly continuing to run.

I certainly agree. I've also noticed that if you re-run the script, it also fails because some of the directories/repos it tries to setup already exist. We should either check for, and remove them (easy solution, but potentially harmful), or try to detect if that step was completed properly and skip ahead (complicated).

Like I said, I wrote this a very long time ago.

timschumi commented 2 years ago

I've also noticed that if you re-run the script, it also fails because some of the directories/repos it tries to setup already exist.

Do you have any examples? As far as I can tell, all of those commands shouldn't have a problem with being run twice, except if something broke in the middle (for example a failed repo init).

ladar commented 2 years ago

I actually think it was repo init that failed, because the directory was already there. I aborted before it was sync'ed because I realized I wasn't recording the output, and re-ran it, without thinking to cleanup first.

ladar commented 2 years ago

@timschumi the device target you provided passed. Probably because it was used to compile a very recent branch.

Beyond the cleanup, error handling, I noticed two things.

At the end of the build script, the script assumes an md5sum was created because that's how it used to work. As I learned, newer branches generate sha256sums. The reason it currently checks/prints out md5sum path is because older TWRP versions only supported md5sum checks during updates. I'm guessing newer TWRP versions support sha256. Either way, I think we should ensure both files are available to make it easy on the user.

We'll just have to test whether an md5sum exists. If it's missing, we simply generate it, and then move onto verifying the sha256sum file the same it does now with md5sum files. Of course, if the md5sum file is present, we would do the opposite, and generate the sha256sum file, while verifying the md5sum file.

And of course, finding a workaround for the Jack Server issue?

ladar commented 2 years ago

@timschumi as an FYI, I've added Alpine 3.15 and Fedora 35, which means the next therelease will have a minor version bump (to 3.6.0). Since this is such a major change, it would be good to include these changes with this version bump as well (one could even argue the these changes warrant a major bump).

I was planning to kick off the next build sometime next week. Mid-week is a decent goal. But that gives us a bit of a deadline to merge this pull request into the repo, along with any other items of pressing concern.

ladar commented 2 years ago

@timschumi can we enable TLSv1 by setting the com.sun.management.jmxremote.ssl.enabled.protocols in the /etc/java-8-openjdk/management/management.properties file?

timschumi commented 2 years ago

I already have a fix queued up locally (it's just removing the appropriate algorithms from a disabledAlgorithms setting in a config file), but I'm currently still test-building and fixing a few things along the way.

timschumi commented 2 years ago

xt897 builds again, I'll look into the sha256sum once I have the time.

ladar commented 2 years ago

xt897 builds again, I'll look into the sha256sum once I have the time.

Thanks for making these changes. I'll try rebuilding them and pushing fresh 3.5.3 test boxes when I get some time.

As for the hash logic, it's pretty easy, i'm just not sure if/how I can could push the change to your branch/pr. Off the top, I was thinking something like:

# Verify an md5sum, or generate it.
MD5IMAGESUM="\$SYSIMAGE.md5sum"
if [ -f "\$DIRIMAGE/\$MD5IMAGESUM" ]; then
  (cd "\$DIRIMAGE" && md5sum -c "\$MD5IMAGESUM") || ( printf "\n\n\nThe MD5 hash failed to validate.\n\n\n"; exit 1 )
else
  (cd "\$DIRIMAGE" && md5sum "\$SYSIMAGESUM" > "\$MD5IMAGESUM")
fi

# Verify a sha256sum, or generate it.
SHAIMAGESUM="\$SYSIMAGE.sha256sum"
if [ -f "\$DIRIMAGE/\$SHAIMAGESUM" ]; then
  (cd "\$DIRIMAGE" && sha256sum -c "\$SHAIMAGESUM") || ( printf "\n\n\nThe SHA256 hash failed to validate.\n\n\n"; exit 1 )
else
  (cd "\$DIRIMAGE" && sha256sum "\$SYSIMAGESUM" > "\$SHAIMAGESUM")
fi

# See what the output directory holds.
ls -alh "\$SYSIMAGE" "\$MD5IMAGESUM" "\$SHAIMAGESUM"

But you'll want to check that closely before adding it to the pull request, since I haven't tested it, and I could have missed something obvious like an escape character. Think of it as scribbles on a napkin.

I'm also not sure the exit 1 logic works like I intend. That is to say, if the hash check fails, it should exit the build script. But I think that syntax will just cause the subshell to exit with an error status, and so the build script will keep going. I can figure out the right syntax if you need it, I'm just running out of time tonight.

timschumi commented 2 years ago

As for the hash logic, it's pretty easy, i'm just not sure if/how I can could push the change to your branch/pr.

I checked the "Allow edits by maintainers" checkbox (I usually always do), so you should just be able to push to git@github.com:timschumi/robox.git (or whatever transport method fits you best).

I'm also not sure the exit 1 logic works like I intend. That is to say, if the hash check fails, it should exit the build script. But I think that syntax will just cause the subshell to exit with an error status, and so the build script will keep going.

We now have -e enabled, so bash will notice that the subshell (or any failing command for that matter) returned an error code and exit by itself.

ladar commented 2 years ago

Both of our test devices build correctly now on the 3.5.3 dev release, so I'm going to merge. I'm hoping to start the build process for the next release mid-week. Note the md5sum check passed, but the typo in the script caused the sha256 generation to fail. I was able to update the script before the second build reached that point, so it worked as intended.

using prebuilt boot.img from BOOTABLE_IMAGES...
   boot size (5861376) is 55.90% of limit (10485760)
  running:  openssl pkcs8 -in build/target/product/security/testkey.pk8 -inform DER -nocrypt
  running:  java -Xmx4096m -Djava.library.path=/home/vagrant/android/lineage/out/host/linux-x86/lib64 -jar /home/vagrant/android/lineage/out/host/linux-x86/framework/signapk.jar -w build/target/product/security/testkey.x509.pem build/target/product/security/testkey.pk8 /home/vagrant/temp/tmpwLxxug /home/vagrant/android/lineage/out/target/product/xt897/lineage_xt897-ota-d93f0ccf1f.zip
done.
[100% 34632/34632] build bacon
Package Complete: /home/vagrant/android/lineage/out/target/product/xt897/lineage-14.1-20211127-UNOFFICIAL-xt897.zip
make: Leaving directory '/home/vagrant/android/lineage'

#### make completed successfully (02:09:56 (hh:mm:ss)) ####
2021-11-27 11:48:54 - common.py - INFO    :   Running: "java -Xmx4096m -Djava.library.path=out/host/linux-x86/lib64 -jar out/host/linux-x86/framework/signapk.jar -w build/make/target/product/security/testkey.x509.pem build/make/target/product/security/testkey.pk8 /home/vagrant/android/lineage/out/soong/.temp/tmpsU0ldM.zip out/target/product/oneplus3/lineage_oneplus3-ota-eng.vagrant.zip"
2021-11-27 11:49:17 - ota_from_target_files.py - INFO    : done.
Compressing system.new.dat with brotli
[100% 100847/100847] build bacon
Package Complete: out/target/product/oneplus3/lineage-18.1-20211127-UNOFFICIAL-oneplus3.zip

#### build completed successfully (08:16:44 (hh:mm:ss)) ####