gwsystems / composite

A component-based OS
composite.seas.gwu.edu
185 stars 70 forks source link

Polymorphic interfaces for resource creation and usage #377

Open gparmer opened 6 years ago

gparmer commented 6 years ago

This is a work in progress, and mainly meant to start a thought process.

The low-level APIs of the system should be polymorphic so that they can be implemented in one of N ways, selected by the context (see #353). Thus, we need a set of APIs that work in each of the contexts. This issue is meant to start the discussion about which resources we're talking about, which contexts they should span across, and the shape of those context's APIs.

The resources include at least:

Implementations

We'd like to support at least a few different implementations of each of these including:

This is a challenge as the API must be equally applicable to scenarios that include multiple components as well as library-based solutions. Correspondingly, the main challenges involve namespacing, and polymorphic representation.

Threads

Straightforward thread manipulation functions:

typedef word_t thd_t;
#define THD_NULL 0;
thd_t thd_create(fn_t f, void *data);
int thd_exit();
typedef word_t thdid_t;
thdid_t thd_id();
int thd_param_set(thd_t t, thd_param_t p); // use existing functions to configuration sched params
int thd_periodic_wait();  // relies on a periodicity being set via `thd_param_set`
int thd_sleep(microsecond_t how_long);

A subtle problem with this API is that it both creates the thread, and executes it in a single function. This might be OK in many cases, but in others, it is not appropriate. For example, perhaps we don't want to run the thread until its parameters are set. Perhaps we want to create a clone of a thread for a checkpoint, and don't want to run it yet (see the component API below).

thd_t thd_create_wait(fn_t f, void *data);
// used in conjunction with thd_wakeup below

Synchronization

A lower-level API for coordinating synchronization between threads is blockpoints. These simply enable threads to block waiting on a specific blockpoint, and to wakeup those threads that have previously blocked (either one, or all).

typedef word_t blkpt_id_t;
typedef word_t blkpt_epoch_t;
int thd_block(blkpt_id_t id, blkpt_epoch_t e, thdid_t dep);
int thd_wakeup(blkpt_id_t id, blkpt_epoch_t e);

I believe this API enables at least three separate libraries for locking and synchronization. First, a blockpoint abstraction to go with the low-level support that abstracts away the epoch.

// blockpoints: an abstraction for sleeping and waking
struct blkpt {
    blkpt_id_t id;
    blkpt_epoch_t epoch_contention; // track the epoch, and a bit for contention
};
int blkpt_init(struct blkpt *blkpt); // always embedded in other abstractions
void blkpt_wakeup(struct blkpt *blkpt);
void blkpt_block(struct blkpt *blkpt);
int blkpt_teardown(struct blkpt *blkpt);

Of course, locks. Likely need to rethink this and make additions in light of multicore.

struct lock {
    thdid_t holder;  // lock holder thread id
    struct blkpt blockpoint;
};
struct lock *lock_create();
int lock_delete(struct lock *l);
int lock_init(struct lock *l); // don't allocate memory
void lock_teardown(struct lock *l); // don't deallocate memory

void lock_take(struct lock *l);
void lock_release(struct lock *l);

Condition variables, though not that desirable, are common. For backwards compatibility (should be updated to mimic POSIX):

struct cv;
struct cv *cv_create();
void cv_delete(struct cv *cv);
int cv_init(struct cv *cv);
void cv_teardown(struct cv *cv);

int cv_wait(struct cv *cv, struct lock *l);
int cv_signal(struct cv *cv);

Channels

We want a channel interface to enable controlled asynchronous communication between different components. @phanikishoreg has been using this in Chaos to communicate between subsystems. Each channel has two major aspects:

  1. A shared memory region used as a ring buffer to pass data.
  2. A rcv/asnd pair in each component.

Ideally, they would be uni-directional, and not enable information to pass the opposite direction. However, this uni-directional information flow isn't possible if we have shared memory due to cache timing channels. Thus, the interface here must allow shared memory, or invocations to another component to mediate data-transfer.

Further, it should allow channels to be created and used within a single component, and created across multiple other components.

typedef word_t chan_t;
chan_t chan_create(unsigned int objsz, unsigned int nobjs, fn_t f, void *data);
chan_t chan_init_snder(unsigned int *objsz);
chan_t chan_init_rcver(fn_t f, void *data, unsigned int *objsz);
int chan_close(chan_t c);

typedef enum { CHAN_NONBLOCK = 1 } chan_flags_t;
int chan_snd(chan_t c, void *obj, chan_flags_t flags);
int chan_rcv(chan_t c, void *obj, chan_flags_t flags);
void *chan_peek(chan_t c, int offset);

In addition to communication of binary data, channels can be used to pass system objects, such as other channels between component end-points. The following example shows an expanded API that isn't very inspired. Additional examples might be used to pass other resources (components, memory, threads, etc...).

// light typing around the channels; are we sending data, or channels?
chan_t chan_chancreate(unsigned int nchan, fn_t f, void *data);
int chan_chansnd(chan_t c, chan_t to_snd, chan_flags_t flags);
int chan_chanrcv(chan_t c, chan_t *to_rcv, chan_flags_t flags);

We'll likely need modifiers on channels to allow synchronous communication with amortization of interaction costs (like call and reply_and_wait in L4), and the ability to decouple the channel's shared memory from the event notification.

Some notes:

Some problems and further work TODO:

Event Notification

The ability to get a notification if any of the channels of interest become available (non-empty, to snd to, or if they have data in them to rcv from). This API centers around evtgrp_wait.

typedef word_t evt_t;
evt_t evt_create(unsigned int max_chan);
int evt_add(evt_t g, chan_t c);
void evt_remove(evt_t g, chan_t c);
void evt_delete(evt_t g);

chan_t evt_wait(evt_t g);

Concerns and TODO

The main concerns about this design is a lack of coherency across abstractions.

Memory Allocation and Sharing

I'm not sure what the API should be here. On the one hand, we have the cbuf API that allows very efficient dynamic memory sharing, and on the other, we have eos which has a static region of shared memory, tracked efficiently with the equivalent of DMA ring buffers. For the former, see the cbuf API.

EOS-like channels. The API for this can be either slightly modified versions of channels, or hidden completely by the channel API, if we add some protection information (who should be able to read which memory).

However, we need to have an API for decoupling the async properties of the default implementation from the shared memory (channel-like) properties. This leads me to believe we want channels that provide synchronous and bi-directional communication, likely with call and reply_and_wait amortizing calls.

General shared memory. Independent of communication, we want the ability to map pages. This might be in response to page-faults, to create components, or simply to alias text segments. The core question for this is one of access control: which components should be allowed to map their pages into other components. At the lowest-level, we, of course, have the resource-table-based nested access control. We want the API to apply in that scenario, but we'd also like the API to be able to backed by invocations to the capmgr.

Device memory. How can we integrate device memory into this system? How can the system management abstractions be portable across x86 and arm?

Notes on the co-design of channels, event notification, memory management, and scheduling

Channels are the API where the entire world of resource management comes to play. They can span from self-contained, simple and fast (spsc, async-only, pre-defined size message passing) to complex, bringing in most system mechanisms and policies (zero-copy, possibly-synchronous, with event notification, without redundant events). Given that we're taking the cost of scheduling to zero, we need to very intentionally plan all of this to maintain end-to-end performance.

This section is here only to start a discussion about this.

I believe that all channels should share a number of properties:

A few different channel implementations that we'd like to accommodate:

Note that some of these are quite composable. The rings in the last can be implemented using the first.

Components

Of course, the management of components is foremost in the system as the isolation properties are a significant portion of our system's value. We'll likely want to expand on this, but to start:

typedef word_t comp_t;
typedef enum { COMP_FLAG_FREEZE } comp_flags_t;

// API for creating the memory and capability namespaces for a component
comp_t comp_create(elfobj_t obj, comp_flags_t flags, /* dependencies...*/);
comp_t comp_copy(comp_t, comp_flags_t flags);
comp_t comp_restore(comp_t target, comp_t chkpt); // `target` was previously `comp_copy(chkpt)`ed
int comp_delete(comp_t c);

// API for thread execution in the component.
thd_t comp_boot(comp_t c);
thd_t comp_boot_chan(comp_t c, chan_t to_rcv); // draft...many options here
thd_t comp_thd_create(comp_t c, thdclosure_index_t idx); // used only for hierarchical thread mgmt, perhaps shouldn't list it here.
// copy the register state from a thread to another where both components are related.
int comp_thd_restore(comp_t dest, thd_t clone_to, comp_t src, thd_t clone_from);

This should be able to support fork semantics (paired with a thread function to modify thread register state), component creation as we currently define it, and checkpoint-restore as in EOS. It should have implementation-specific options for library-based efficient implementations, and cross-library, process-manager implementations.

phanikishoreg commented 5 years ago

Thank you for creating this issue and the information included is great!

I've a few things to talk about, and I'm going to read this again and again and think if we need anything else for abstractions or API.

  1. Threads I'm confused about the thd_t and thdid_t here. They're both word_t and it looks like thd_t is only used for param_set. Where would the state data for a thread be kept? I'd think that we'd need some sort of struct thd here to hold thdid_t, etc. Perhaps I'm missing something, this abstraction is only for resource management and not involving scheduling?

I'd think that this API is for both resource management and as a scheduling interface, so I'd modify the API slightly:

thd_t thd_curr(void);
thdid_t thd_id(thd_t);

int thd_yield(void);
  1. Synchronization

Similar confusion here, blkpt_id_t and blkpt_t. I can see the difference in usage and API, but I'm still not clear how these abstractions interplay.

  1. Channels

My main problem with this API is that there is no "shared" key or namespace identifying the two communicating parties like we have channelkey_t in the current API. I think we'd need some sort of a shared key that is statically laid out so either party know whom they're talking to. Without that, they're more of a dynamic channels and that is fine if there is a co-ordinator that tells either party that they should use a specific channel here. I think dynamic channels vs static channels have different use-cases. For ex, I'd think that dynamic channels could be used for inter-vm communication, however static channels are more useful for a peer-to-peer communication where they both know each-other.


I think this is a great start for a discussion, and I'm willing to meet any day to talk about these. We could perhaps set up a meeting for Tuesday (tomorrow?) or sometime this week to discuss this, maybe use the Wednesday (5-6) timeslot and talk?

This is a good discussion to have especially because I think this could form a good foundation to the RTOS layer we were talking about recently.

hungry-foolish commented 5 years ago

The main concerns about this design is a lack of coherency across abstractions.

We have two synchronization primitives: blockpoints and rcv capabilities. Can these be unified?

The event notification mechanisms are focused on channels. Can they accommodate locks?

Do we require a pairing between channels and threads? If not, how can rcvs be integrated? If so, then how do we allow events to span multiple channels? Does this require intermediate threads and multiplexing?

How does this all integrate with IPI rate-limiting and polling?

What is the synchronization hierarchy? Should locks be implemented with channels? A few options ("->" is a dependency, and "(a, b)" means that both "a" and "b" are parallel -- not dependent on each other): (scheduler tokens, sched events) -> atomic instructions -> sched locks -> block points -> (locks, channels) (scheduler tokens, sched events) -> atomic instructions -> (sched locks, block points -> channels -> locks) (scheduler tokens, sched events) -> atomic instructions -> (sched locks, block points -> (channels, locks)

Extra comments:

  1. Consider implementing thd_create as a combination of thd_wait_create and thd_wakeup: thd_create { thd_wait_create thd_wakeup }; would this be better?

  2. Regarding channels - We now have endpoints, so maybe we can consider implementing the channel and event group with endpoints(rather than implementing themselves from scratch).

  3. For pairing between threads & endpoints, maybe we can consider some sort of thread mailboxes.

  4. I also have the same confusions as Phani. What are these types for....

  5. What is thd_periodic_wait for?

6.Decoupling memory allocation and creation might be a better design in this API. This simplifies the API.

  1. These chan_chancreate functions are confusing... If the user want them, let themselves implement them. The channels can pass anything, and that's it.

  2. There might be real-time issues regarding event notifications.

This is a great issue to discuss I think, and we can meet to talk about it tomorrow(Tuesday).

gparmer commented 5 years ago

I added some thoughts above about the types of channels we need to support. The hardest part of this is going to be coordination between shared memory allocation and channels.

Inspired by rust's channel's API, I think it would be good to separate sender's data-structure from receiver. I also now see it as inevitable that we'll have to separate between channels that are guaranteed-local to a component, and those that are not. Something like:

typedef word_t chan_snd_t;
typedef word_t chan_rcv_t;
typedef word_t chan_t; // polymorphic representation of either of the previous

typedef enum {
    CHAN_FLAG_SND     = 1,
    CHAN_FLAG_RCV     = 2,
    CHAN_FLAG_CHAN    = 4,      // channel to send channels
    CHAN_FLAG_LOCAL   = 8,      // channel only accessed locally within this component
    CHAN_FLAG_MP      = 16      // possible multiple producers
} chan_flags_t;

int chan_create(unsigned int objsz, unsigned int nobjs, chan_flags_t flags, chan_snd_t *snd, chan_rcv_t *rcv);
chan_snd_t chan_snd_dup(chan_snd_t snd);  // multisender channels
void chan_rcv_close(chan_rcv_t c);
void chan_snd_close(chan_snd_t c);

int chan_snd(chan_snd_t snd, void *obj);
int chan_rcv(chan_rcv_t rcv, void *obj);
// non-blocking variants
int chan_snd_nb(chan_snd_t snd, void *obj);
int chan_rcv_nb(chan_rcv_t rcv, void *obj);

typedef word_t chanchan_snd_t;
typedef word_t chanchan_rcv_t;

// flags determine if send or receive channels are sent through this channel for channels
int chanchan_create(unsigned int nchan, chan_flags_t flags, chanchan_snd_t *snd, chanchan_rcv_t *rcv);
int chanchan_snd(chanchan_snd_t c, chan_t to_snd);
int chanchan_rcv(chanchan_rcv_t c, chan_t *to_rcv);
void chanchan_snd_close(chan_snd_t snd);
void chanchan_rcv_close(chan_rcv_t rcv);

// A version of reinterpret_cast...does type checking an only casts if the proper flags are set.
int chan_to_snd(chan_t c, chan_snd_t *snd); // cast the received channel to a snd (if the types match)
int chan_to_rcv(chan_t c, chan_rcv_t *rcv);
int chan_to_chan_snd(chan_t c, chanchan_snd_t *snd);
int chan_to_chan_rcv(chan_t c, chanchan_rcv_t *rcv);

Instead of the chanchan namespace, I like chan2 (read as chan^2 = chanchan) more. Thoughts?