Closed LedgeDash closed 5 years ago
It looks like the panic comes from firecracker/sys_util/src/terminal.rs:33
where tcsetattr
gets called. And based on this page. EIO means
The process group of the writing process is orphaned, the calling thread is not blocking SIGTTOU, and the process is not ignoring SIGTTOU
@alevy Help! =)
Yue and I spent around an hour trying to debug the issue but couldn't figure out. The error seems to be around the terminal device.
I checked the VmAppConfig
and VmApp
instances against the previous working version and they look exactly the same.
I'll keep digging into it. But could you please also take a look?
I’ll take a look today.
Controller with new scheduler is working now. tested with concurrency limit = 1000, i.e., having multiple running vms for each function. Resource allocation tracking, run queue and idle queue mgmt all seem to work. The problem was just that the controller main thread was exiting too soon sometimes even before vsock connection manages to establish. So added logic to wait until all requests finish. Next step (should be able to finish tomorrow)
@LedgeDash I would integrate with tty (i.e. rebase from master) first, then merge this, then do the rest.
booting from snapshot is tested with controller:
controller branch: scheduler
, commit: 79603aba
. Firecracker commit: 76fed4bb
.
workload includes only 2 functions lorempy
and loremjs
. Concurrency limits tested with 1, 10 and 100.
command line options added:
--debug
: control whether to close VMs' stdout--snap
: control whether to boot from snapshots. This allows us to have only one function config yaml file with load_dir
field specified.I think this is ready to merge. Working on workloads with inter-arrival time added while waiting for a final quick review.
WARNING =) The current code as it stands at
commit ce6a20dd
does NOT work T_T. The problem is thatvmm.start_instance().expect("start")
panics with:Now about the code/implementation. What hasn't changed: I haven't removed anything from previous version of the controller because I wanted to have some reference to test against. So you'll still see old data structures such as
active_functions
,warm_functions
. But they'll be removed once we have a fully functional controller.The overall design stays largely the same, where we have a
ConnectionManager
thread per VM that sits in between the controller thread and actual VM. TheConnectionManager
thread consumes requests from the controller thread viampsc::channel
and forwards requests via vsock to the VM. For responses, theConnectionManager
thread gets responses from a VM and forwards it to aresponse receiver
thread who prints it to console.The current implementation still uses vsock which we need to change soon. But the code is mostly hidden inside the
listener
module. So from a controller/scheduler perspective, it just holds aSender<Request>
per VM and is oblivious to howConnectionManager
thread actually communicates with the VM.What's changed: In the new code, the
cluster
type represents the physical cluster and is used to keep track of hardware resource limits. Currently it only supports one machine so it just read the local/proc/meminfo
and/proc/cpuinfo
.The
Vm
struct represents a vm from a mgmt perspective. It holds a vm_handle (currently just usingcid
. Will change after moved to tty), a request sender, and theVmApp
. So with anVm
struct instance, we can send requests to a VM and kill/evict an VM. Now the vm_handle is also sent back in the response so that we can know which vm finished running.For each function, there's a running list and an idle list, both are
Vec<Vm>
. All running lists are insiderunning_functions
which is aBTreeMap<String, Vec<Vm>>
and all idle listsidle_functions
.Now the
aws_schedule()
function is the lambda scheduling algorithm. Its logic is in this basecamp post. I haven't implemented the evict step yet because need to make sure vm can actually run =)