gwu-cs-iot / collaboration

Spring '20 IoT - systems and security class. This is the collaborative half of the class.
https://www2.seas.gwu.edu/~gparmer/classes/2020-01-01-Internet-of-Things-Systems-Security.html
MIT License
14 stars 26 forks source link

Paper Discussion 15a: Chaos: a System for Criticality-Aware, Multi-core Coordination #111

Open Others opened 4 years ago

Others commented 4 years ago

paper link

Have fun :)

pcodes commented 4 years ago

Reviewer Name: Pat Cody Review Type: Critical

Problem Being Solved

Keeping mixed-criticality systems secure when running on a single multicore device is challenging due to the complexities of inter-core coordination. A frequent problem is interference, when lower assurance code generates too many interrupts and impacts the performance of the high-assurance code. However, running each assurance level as a VM has high overhead costs and makes it impractical.

Main Contributions

CHAOS provides a separate runtime environment with the bare necessities (simple bear necessities) to devirtualize high-assurance/high-criticality code, such that it is not tied to the execution of the lower-assurance code. Instead, the two runtimes communicate via proxies. If the code running in ChaosRT (the high-assurance code) needs data available from the low-assurance code, it can make a request using the proxies. Chaos was implemented within NASA's Core Flight System (cFS), and it relies on a handful of key features provided by the Composite OS.

Questions

Critiques

samfrey99 commented 4 years ago

Reviewer: Sam Frey Review Type: Skim

Problem To reduce size and power consumption, many embedded systems requiring multiple execution streams have transitioned away from multiple single-core processors in favor of a single multi-core processor. Having a multi-core processor adds complexity for systems that run software with differing levels of criticality. Interrupts from low criticality software can interfere with high criticality software.

Contributions Chaos lowers interference by removing high criticality software from the primary subsystem and allowing it to run in a minimal ChaosRT environment without interruption. Chaos improved processing latency by a factor of 2.7 compared to a standard Linux real-time environment while also improving isolation of critical code.

jacobcannizzaro commented 4 years ago

Reviewer: Jacob Cannizzaro Type: Comprehend/Skim

Problem:

With the trend of switching from single core processors to single multicore processors, there can be a lot of interference from less critical tasks taking up processing time as well as introduces more contention for shared resources. This adds a lot of overhead when trying to run all of this code, no matter the assurance level, in one place.

Main Contributions:

This paper uses Chaos to devirtualize some tasks. By taking tasks that are highly critical, and running them in a minimal ChaosRT environment, VM's don't have to have as much overhead, and communication can be dealt with with proxies. Communication now flows from lower priority tasks to higher functionality VM using the help of proxies that are abl to bound interference and latency. This reduces the worst case latency of a system by 3.5 that of the Linux equivalent.

anguyen0204 commented 4 years ago

Reviewer: Andrew Nguyen Review Type: Skim

Problem There is a need to minimize size and power of embedded systems. Many systems move to a single stream processor in order to achieve this making it difficult to coordinate high & low criticality tasks. The paper introduces Chaos that devirtualizes to obtain high criticality tasks.

Contributions Chaos reduces interference through its devirtualization process like previously mentioned. Tasks are isolated and moved to the ChaosRt environment that enables high-assurance & high-criticality tasks. These are also done in rate-limiting servers. The paper then looks into the design, the scenarios of interference, synchronic communications, and the implementations of it.

Questions

  1. How exactly does TCAPS work?
  2. Why was synchronous communications used in thread migration for Composite and not asynchronous?
hjaensch7 commented 4 years ago

Reviewer: Henry Jaensch Review Type: Skim

Problem Being Solved

Embedded systems are moving toward using one chip for all of the tasks on the system. This paper proposes a way to maintain criticality and high resource efficiency when mixing many components and functions on the same processor. The priority here is to maintain efficient feature rich user applications while also providing isolation guarantees for high criticality processes.

Main Contributions

The paper introduces CHAOS a system for de-virtualizing high-criticality systems so that deadlines can be met without interference from other applications with lower-criticality. This is achieved by providing CHAOS RT which is a bare bones runtime that allows predictable execution of tasks. In order to support communication between mixed criticality tasks proxies are used to maintain feature richness.

Question

  1. What's the difference between bounded asynchronous communication and synchronous communication?

  2. Why was Linux one of the choices for comparison here? Mixed assurance and criticality yes, but Linux doesn't make any real-time guarantees.

themarcusyoung commented 4 years ago

Reviewer: Marcus Young Review Type: Skim

Problem Modern embedded systems are increasingly using single multi-core processors that are asked to process extremely complex tasks with different criticality levels. Since these systems are using a single multi-core processor instead of multiple single-core processors, there is a need to extract high criticality tasks to run them in a minimal runtime environment in order to improve human or equipment safety.

Contributions Chaos removes high criticality software from the primary subsystem and puts it in a ChaosRT minimal runtime environment. Chaos improved processing latency for a sensor/actuation loop in satellite software experiencing inter-core interference by a factor of 2.7 while reducing worst-case by a factor of 3.5 over a real-time Linux variant.

rachellkm commented 4 years ago

Reviewer: Rachell Kim Review Type: Skim

Problem Being Solved:

Embedded systems using multi-core processors to support mixed-criticality and multi-assurance levels often face difficulty in enforcing strict isolation between subsystems. Because shared abstractions between cores may trigger interference, it is important to protect high-criticality tasks from faults caused by subsystems of low-assurance and low-criticality. Moreover, systems must also maintain high-confidence in correctness while supporting feature-rich software, and this condition is considered to be difficult to maintain with current technology.

Main Contributions:

The authors of this paper propose a system called Chaos, which aims to remove interference caused by inter-core coordination in multi-core systems via devirtualization of high-criticality tasks. High-criticality tasks are moved into an execution environment called ChaosRT, thereby allowing predictable execution with minimal interference from shared subsystems. This paper also outlines example situations in which shared memory and inter-core coordination may impact the execution of high-criticality code.

Questions:

  1. Could the proxies and ChaosRT environment introduce more points of failure in place of removing interference?
rebeccc commented 4 years ago

Reviewer Name: Becky Shanley Review Type: Critical

Problem Being Solved

Embedded systems struggle to manage the balance between minimizing SWaP (Size, Weight, and Power) and providing time guarantees and resources to the highest priority processes. In embedded systems, this problem is much more severe because high priority tasks usually include detrimental impacts on the physical world and the safety of humans. It's a difficult problem to solve because of the requirement that the high priority tasks work with the lower priority tasks to provide many real-time functionalities that cannot afford to be completely separated.

Main Contributions

CHAOS is a devirtualization system that is used to guarantee that the high priority tasks have access to the resources and low-latency requirements they need. It achieves this by extracting high priority tasks into CHAOSrt, a real-time environment that is separated from the interference of potentially low assurance level tasks. In all, this paper contributes to the problem domain by:

  1. evaluating the impact low assurance tasks can have on the high priority tasks in the same runtime
  2. introducing the devirtualization runtime CHAOS
  3. the technique that utilizes InterProcess Interrupts rate-limiting to bound interference from this communication method and bounds latency
  4. An evaluation of CHAOS to reliability-focused systems

Questions

  1. Why is there an evaluation of CHAOS on Linux at all? Since most embedded devices aren't running Linux, how meaningful is the comparison of CHAOS on a real-time OS versus CHAOS on Linux?
  2. Since IPI and Shared Memory both have their pros and cons, why was it decided to go with IPI? Rate-limiting reads like a complex process and I'm curious to know if it was ever explored to try to solve the problems of shared memory in the same way that it was explored to solve the problems with IPI.

Critiques

  1. As mentioned before, this paper is dense. It was dense to the point of being extremely difficult to get anything meaningful out of in the first skim. It got to a point where I was checking references and googling words at least once in a paragraph.
  2. The use of the world "devirtualization"-- it's explained in a footnote, and after a deep read I get why it's called this, but it also feels unnecessarily confusing. As a person who tends to google everything I don't understand, in my first skim I was very confused because I missed the footnote and was trying to apply compiler devirtualization to this paper, which, is not helpful.
bushidocodes commented 4 years ago

Reviewer: Sean McBride

Review Type: Simple Skim

Problem Being Solved:

How can one consolidate mixed criticality workloads onto shared multi-core systems? Also, how can one leverage the more differentiated QOS attributes of a modern RTOS to provide better assurance guarantees to industry-standard software systems that run mixed criticality subsystems on a shared POSIX backend?

Main Contributions

  1. Modifies composite, an existing RTOS, to provide Chaos RT, a minimal sandbox that can run high assurance code and transparently communicate with a low assurance sandbox via software proxies.
  2. Uses devirtualization to extract high criticality sections from legacy systems running on nonspecialized operating systems that combine mixed-criticality workloads into Chaos RT.
  3. Applies a rate limiting technique to preemptive messages passed via inter process interrupts from rump software in a low-criticality sandboxes to the sandbox running high-criticality devirtualized work to limit interference.
  4. Uses TCaps (delegration of time) to coordinate different user-level schedulers in the different sandboxes
  5. Demonstrates that application of this technique to the cFS software stack used in NASA missions (which depends on a POSIX backend) using a NetBSD rumpkernel.
ericwendt commented 4 years ago

Reviewer Name: Eric Wendt Review Type: Critical

Problem Being Solved: Some of the fundamental problems that need addressing are size, weight, and power for IoT devices. Finding a good balance for these requirements is exceedingly difficult, combining techniques from both hardware and software. This paper dives into software solutions focusing on cutting down interference between highly-critical tasks and lower tasks.

Main Contributions Fortunately, many of the main contributions are laid out in a distinct sub-header in the paper.

  1. Outlining how low-criticality tasks can interfere with high-criticality tasks. Useful graphs are including to demonstrate impact.
  2. Detail of a technique called devirtualization which will cut down on interference between high and low tasks without compromising on dependency between the two.
  3. IPI rate limiting technique which allows CHAOS to bound IPI latency.
  4. Evaluation of CHAOS on multiple operating systems.

Questions:

Critiques:

Sorry for the late post, lost power for a few hours.

RyanFisk2 commented 4 years ago

Reviewer: Ryan Fisk

Review Type: Critical

Problem Being Solved

Embedded systems are increasingly required to run many different processes at varying degrees of criticality. High-criticality systems need to run at a high priority to protect human or equipment safety, whereas lower-criticality systems are nice to have, but not as important. Due to resource constraints on IoT devices, these processes have to be scheduled by the same processor, and the underlying hardware overhead for deciding which processes run can cause interference with high priority tasks.

Contributions

The paper introduces ChaosRT, a minimal runtime environment that removes high-criticality tasks from the management system of the VM it normally would run on and thus minimizes or eliminates interference from lower priority tasks. Tasks that are removed from the VM can still communicate with the higher-priority tasks using proxies that handle communication between the devirtualized, higher criticality tasks and the rest of the system.

Questions

1) If a devirtualized system required sensor readings or some other data that was gathered in a lower priority task, wouldn't it still have to wait for that task to complete and for the proxy to get the information?

2) Why does devirtualization work so well for this? I'm confused as to how this reduces interference from other tasks.

3) What happens if there is more IPI interference than allowed messages for a certain task?

Critiques

1) The example of the NASA cFS system they used to explain the problems with virtualization was really helpful, I would've liked to see an example using ChaosRT when they talked about the implementation.

2) What security concerns are there with the proxy? Can it be spoofed to send bad data to a safety critical system?

huachuan commented 4 years ago

Reviewer: Huachuan Wang Review Type: Skim

Problem being solved

Embedded systems are increasingly required to provide both complicated feature-sets, and high-confidence in the correctness of mission-critical computations. Functionalities traditionally performed are consolidated onto less expensive and more capable multi-core, commodity processors are very complicated. Supporting feature-rich, general computation and high-confidence physical control is difficult.

Contributions

This paper presented Chaos which could effectively use the increased throughput of multi-core machines and ensuring the necessary isolation between tasks of different criticalities and assurance-levels. Chaos also devirtualizes high criticality tasks to remove the overhead.

  1. Devirtualization to extract high-criticality subsystems from lower-assurance legacy systems while maintaining functional dependencies and predictable inter-core message passing mechanisms.
  2. IPI rate-limiting technique enables Chaos to bound the IPI interference and latency of notifications for inter-core coordination.