robur-coop / albatross

Albatross: orchestrate and manage MirageOS unikernels with Solo5
ISC License
141 stars 17 forks source link

query manifest failure #101

Closed palainp closed 2 years ago

palainp commented 2 years ago

I tried to send an unikernel to an albatross endpoint but I did not succeed. Tracking down the pb leads me to that situation, from the remote host I can create the .req, sign it and it fails to start the VM (the journalctl says that it fails to query abi), and locally provision-request does not creates the .req file and fails with a query manifest error message.

Strangely, solo5-elftool seems to correctly read the .hvt kernel. (The kernel is the static tls from mirage-skeleton, I have mirage 3.10.8 and ocaml switch is 4.13.1). I installed solo5-elftool and albatross with mirage but the system may still have some old installation remnants (I tried to remove them manually and which albatross-provision-request tells me that I launch the correct binary).

$ eval $(opam env)
$ albatross-provision-request create --net=service:br0 --arg=--ipv4-gateway=10.0.0.1 --arg=--ipv4=10.0.0.3/24 https https.hvt
albatross-provision-request: query manifest failure: Owee_buf.Invalid_format("No ELF magic number")
$ albatross-provision-request --version
version v1.4.1 protocol version 4
$ solo5-elftool --version
usage: solo5-elftool COMMAND ...
solo5-elftool version v0.6.9

COMMAND is:
    gen-manifest SOURCE OUTPUT:
        Generate application manifest from SOURCE, writing to OUTPUT.
    query-abi BINARY:
        Display the ABI target and version from BINARY.
    query-manifest BINARY:
        Display the application manifest from BINARY.
$ solo5-elftool query-manifest https.hvt
{
  "type": "solo5.manifest",
  "version": 1,
  "devices": [
    { "name": "service", "type": "NET_BASIC" }
  ]
}
hannesm commented 2 years ago

Thanks for your report. Indeed, albatross since 1.4.0 uses the OCaml version of solo5-elftool (https://git.robur.io/robur/ocaml-solo5-elftool, based on https://github.com/let-def/owee) - and not the C version of solo5-elftool (as provided by https://github.com/solo5/solo5) -- the reasoning is that this cuts on the runtime dependencies (and removes creating processes).

Now, the issue seems to be that Owee doesn't like your hvt binary. You should have the osolo5-elftool installed in your opam switch / bin directory -- could you execute: osolo5-elftool query-manifest https.hvt (I expect Owee_buf.Invalid_format("No ELF magic number"))?

Which version of solo5-elftool do you have installed (opam info -f installed-version solo5-elftool)?

If your unikernel (https.hvt) does not include any private material, would you mind to share this with us?

//cc @reynir

palainp commented 2 years ago

Thanks for your reply, here are the results (unfortunately not the expected ones :( ):

$ osolo5-elftool query-manifest https.hvt
{ "type": "solo5.manifest", "version": 1,
  "devices": [ { "name": "service", "type": "NET_BASIC" } ]
}
$  opam info -f installed-version solo5-elftool
0.3.0

My unikernel is the stock mirage-skeleton/applications/static_website_tls. My https.hvt have the 0x7f E L F header.

palainp commented 2 years ago

And here is the link to the https.hvt file

hannesm commented 2 years ago

Thanks, unfortunately we passed the compressed image to the ELF parser -- whch complained about that. Would you mind to test #102 (using opam pin add albatross https://github.com/roburio/albatross.git#fix-provision) and report back if that fixes your issue?

Thanks a lot again, and if it fixes the issue at hand, I'll merge and cut a 1.4.2 release soon.

palainp commented 2 years ago

This fix does works for the request creation, but I now have the same error message in the albatross systemd logs when I use albatross-client-remote-tls:

févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [DEBUG] connection from unix domain socket
févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [DEBUG] now reading
févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [DEBUG] read host [vm: https]: unikernel create typ solo5
févr. 04 17:15:58 vmsul3bis albatrossd[766009]:                     compression true image 3852553 bytes fail behaviour quit
févr. 04 17:15:58 vmsul3bis albatrossd[766009]:                     cpu 0 32 MB memory block devices  bridge service -> br0
févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [DEBUG] now checking resource policies
févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [EXEC:766035] ['ip' 'link' 'show' 'br0']
févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [EXEC:766036] ['ip' 'tuntap' 'show']
févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [EXEC:766037] ['ip' 'tuntap' 'add' 'vmmtap0' 'mode' 'tap']
févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [EXEC:766039] ['ip' 'link' 'set' 'dev' 'vmmtap0' 'up']
févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [EXEC:766041] ['ip' 'link' 'set' 'dev' 'vmmtap0' 'master' 'br0']
févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [DEBUG] prepared vm with taps service -> vmmtap0
févr. 04 17:15:58 vmsul3bis albatross-console[718426]: albatross-console: [DEBUG] opening /run/albatross/fifo/https for reading
févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [DEBUG] create: received valid reply from cons host [vm: https]: success (request host [vm: https]: console add)
févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [ERROR] create (exec) failed query abi failure: Owee_buf.Invalid_format("No ELF magic number")
févr. 04 17:15:58 vmsul3bis albatrossd[766009]: albatrossd: [EXEC:766058] ['ip' 'tuntap' 'del' 'dev' 'vmmtap0' 'mode' 'tap']

I may have forgot to update something (I just have cp the albatross* from my ~/.opam/4.13.1/bin dir to /usr/libexec/albatross and systemctl reload + restart albatross_daemon).

hannesm commented 2 years ago

Thanks, you did everything right. I just pushed eefe9c0 (to the same branch, fix-provision), which fixes your other issue. This is in albatrossd, so indeed you need to copy that daemon around and systemctl reload + restart albatross_daemon.

Sorry again for the trouble, and thanks for your detailed report :)

palainp commented 2 years ago
host [vm: https]: success: created VM

Thank you so much for your time and your quick help !