princeton-sns / firecracker-tools

5 stars 5 forks source link

Replace vsock with pipes for communication b/w firerunner and firecracker and integrate pieces together #16

Closed tan-yue closed 5 years ago

tan-yue commented 5 years ago
  1. Replace vsock with pipes for communication b/w firerunner and firecracker
  2. Add outl.c to do port write
  3. For outl to work, add the customized init and changed default kernel command line argument from console=none to console=ttyS0
  4. Since we are using a second serial device ttyS1 to do host-guest communication, change runtime wrappers accordingly.
  5. Snapshot is not working yet. When boot from a snapshot, writing out to ttyS1 works, but somehow reading from ttyS1 is not working yet.
LedgeDash commented 5 years ago

Seems that the integrate branch isn't off of the most up-to-date version of master. The controller code is outdated. I'll be worth pulling in latest master and test controller with the new feature as well.

LedgeDash commented 5 years ago

Could you give an example of how to test this? Here what I did that didn't work:

  1. rebuild runtime rootfs, i.e., run mk_rtimage.sh
  2. make application fs
  3. invoke firerunner: $ cat single_req.json | ./target/debug/firerunner --kernel vmlinux --rootfs python2.ext4 --appfs lorempy2.ext4

But I got:

[    0.016452] dmi: Firmware registration failed.
[    0.049428] zswap: default zpool zbud not available
[    0.050193] zswap: pool creation failed
OpenRC init version 0.41.2.6fc2696f3e starting
Starting sysinit runlevel

   OpenRC 0.41.2.6fc2696f3e is starting up Linux 4.14.55-84.37.amzn2.x86_64 (x86_64)

 * Mounting /proc ...
 [ ok ]
 * Mounting /run ...
 * /run/openrc: creating directory
 * /run/lock: creating directory
 * /run/lock: correcting owner
 * Caching service dependencies ...
Service `hwdrivers' needs non existent service `dev'
 [ ok ]
Starting boot runlevel
 * Remounting devtmpfs on /dev ...
 [ ok ]
 * Mounting /dev/mqueue ...
 [ ok ]
 * Mounting /dev/pts ...
 [ ok ]
 * Mounting /dev/shm ...
 [ ok ]
 * Loading modules ...
modprobe: can't change directory to '/lib/modules': No such file or directory
modprobe: can't change directory to '/lib/modules': No such file or directory
 [ ok ]
 * Mounting misc binary format filesystem ...
 [ ok ]
Starting default runlevel
 * Starting serverless-workload ...
stty: can't open '/dev/ttyS1': No such file or directory
thread 'main' panicked at 'Failed to kill child: Sys(ESRCH)', src/libcore/result.rs:999:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
tan-yue commented 5 years ago

I've merged master into this branch. @LedgeDash Your error is because the kernel binary we've been using only supports 1 runtime serial device. I replaced vsock section with ttyS1 section in firerunner README. The new section explains the two parameters that we need to set to use ttyS1.

Other than that you need a new kernel binary, the way you invoke firerunner/controller should be the same as before.

LedgeDash commented 5 years ago

@tan-yue Thanks for the pointer. Just to make sure I'm doing this right. In order to run and test this, I need to

  1. Build a new vmlinux image
  2. then invoke with the same cmdline that I used

Is this correct? Also, you mentioned microvm-kernel-config. That is just a config file that a kernel build could be based on, right? Meaning if I just modify microvm-kernel-config and do a cargo build it won't automatically produce the right vmlinux image since Firecracker source doesn't include linux source code.

tan-yue commented 5 years ago

@LedgeDash

To "Is this correct?", basically yes. Just one thing to be aware that if you wanna override the default cmd_line argument of firerunner/controller, make sure you still have console=ttyS0.

And yes to your last question.

tan-yue commented 5 years ago

Good news: snapshot now is almost working. It is "almost" because when we boot from a snapshot, currently the guest fails to wake up from sleep().

LedgeDash commented 5 years ago

I built a new kernel with the specified config. This fixed the previous can't open '/dev/ttyS1' issue. However, I'm still not seeing the function's output. Here's what I see:

[luzhuo@Jasper] cat single_req.json | ./target/debug/firerunner --kernel vmlinux-2 --rootfs python2.ext4
--appfs lorempy2.ext4
[    0.000567] ACPI BIOS Error (bug): A valid RSDP was not found (20190703/tbxfroot-210)
[    0.074798] zswap: default zpool zbud not available
[    0.075522] zswap: pool creation failed
OpenRC init version 0.41.2.6fc2696f3e starting
Starting sysinit runlevel

   OpenRC 0.41.2.6fc2696f3e is starting up Linux 5.3.0-rc5 (x86_64)

 * Mounting /proc ...
 [ ok ]
 * Mounting /run ...
 * /run/openrc: creating directory
 * /run/lock: creating directory
 * /run/lock: correcting owner
 * Caching service dependencies ...
Service `hwdrivers' needs non existent service `dev'
 [ ok ]
Starting boot runlevel
 * Remounting devtmpfs on /dev ...
 [ ok ]
 * Mounting /dev/mqueue ...
 [ ok ]
 * Mounting /dev/pts ...
 [ ok ]
 * Mounting /dev/shm ...
 [ ok ]
 * Loading modules ...
modprobe: can't change directory to '/lib/modules': No such file or directory
modprobe: can't change directory to '/lib/modules': No such file or directory
 [ ok ]
 * Mounting misc binary format filesystem ...
 [ ok ]
Starting default runlevel
 * Starting serverless-workload ...
thread 'main' panicked at 'Failed to kill child: Sys(ESRCH)', src/libcore/result.rs:999:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

Am I invoking the function/firerunner correctly?

tan-yue commented 5 years ago

Try ./target/debug/firerunner --kernel vmlinux-2 --rootfs python2.ext4 --appfs lorempy2.ext4 < single_req.json. Using redirection should allow you to see the output.

I just tried using cat and | on my end, I also didn't see the output. I am not sure why.

LedgeDash commented 5 years ago

< redirection worked for me. Is this because firerunner or firecracker expect stdin to be backed by a file not a pipe?

LedgeDash commented 5 years ago

I've done the testing on my end. firerunner works with both python and node lorem. Just a couple of things.

  1. How difficult would it be to remove the boot messages (I guess from ttyS0)? Currently, when invoke an function, boot messages are printed to the console because (I think) the default kernel arg has console=ttyS0. It would be nice to have this remove. But, as @tan-yue you pointed out, this is less of a performance concern when booting from snapshot. So we could punt it until we measure the performance and see if this will be an issue.
  2. It currently doesn't work with controller. I'll open up another issue to track this. But now that I'm more comfortable with Rust, I'll track down the issue and fix it on my end.

From a dev perspective, we could merge this in, knowing that it's not currently compatible with the controller and snapshot doesn't fully work yet, and then fix those as separate issues. Or we can continue development on the integrate branch to fix these issues first before merging. I personally prefer the latter because it keeps a working master and feels cleaner, but I think both are good path forward because we're a small team and dev is active.

What do you think @alevy ?

LedgeDash commented 5 years ago

Issue for controller: #17

alevy commented 5 years ago

I don't mind us fixing issues on this Branch as long as it's quick. We don't want the controller to get too out of sync

tan-yue commented 5 years ago

I just pushed a patch where runtime wrappers are updated to support multi-vcpu snapshot.

tan-yue commented 5 years ago

@LedgeDash Just realize controller is built around vsock. My changes didn't touch controller part of code at all. Sorry about that. Do you wanna me to take over fixing controller? So that you can focus on scheduler.

tan-yue commented 5 years ago

Since @LedgeDash is busy working on the scheduler. I am stepping in to take over integrating controller. My plan is to have the patch ready tonight.

tan-yue commented 5 years ago

Controller on this branch now also works with pipes now. Tested with example_func_config.yaml and example_requests.json.

The only integration left is to allow controller boot functions from snapshots. I plan to do it later in another PR and merge this in first. What's your thought @alevy ?