Open ormsbee opened 1 month ago
The first concrete use case for a more generalized Container capability is to model Units for Libraries and eventually Courses.
We must support permutations of content in containers that is customized for a group of people or even an individual learner. Examples of this:
Plugin authors must be able to create their own container types, as well as their own dynamic content types.
Courses currently have a hierarchy of Course → Section → Subsection → Unit, and we must be able to represent this in Learning Core. This representation does not have to be represented as nested Containers as it is today in ModuleStore, but we must have some plan for how these larger groupings will work.
A drawback of XBlock-based dynamic children has been poor performance. The system should be created so that queries to get the content for a given user are predictable and fast. While invoking plugin code may be unavoidable for assigning a user to a particular variant of dynamic content, the system should be able to read these assignments without invoking plugin code.
We must support current XBlocks that have dynamic children (LibraryContentBlock, SplitTestBlock, and RandomizeBlock), but nothing in Learning Core should require or assume XBlocks. Learning Core should hold the primitives that are needed to build these XBlocks on top of, but should be more broadly usable for Components that are not XBlocks.
Containers should be able to differentiate between changes in the metadata of the container itself (e.g. title, ordering of contents, adding or removing content), vs. changes to the contents (e.g. a component in a Unit is updated with new text).
The Split ModuleStore system for storing course content has the unfortunate property that even minor changes require large writes of the entire structure, resulting in a lot of wasted space. Changes to the content of a Container should ideally not require rewriting a new version of the Container.
An individual Container must scale to at least 100 items, though ideally it should work for up to a 1000 (TBD how much of a requirement this is). The median size of Containers will be much smaller, say ~10, give or take.
These are the models that I've sketched out to try to capture:
containers
app–lowest level of these)selectors
app, builds on containers
).units
app, builds on containers
and selectors
)–basically empty shells at the moment. Note that we could implement containers
and units
first, and add selectors
later, but I would like to validate selectors
because it's the riskiest part of this design.
openedx_learning/apps/authoring/containers/models.py
from ..publishing.model_mixins import PublishableEntityMixin, PublishableEntityVersionMixin
class EntityList(models.Model):
"""
EntityLists are a common structure to hold parent-child relations.
EntityLists are not PublishableEntities in and of themselves. That's because
sometimes we'll want the same kind of data structure for things that we
dynamically generate for individual students (e.g. Variants). EntityLists are
anonymous in a sense–they're pointed to by ContainerEntityVersions and
other models, rather than being looked up by their own identifers.
"""
pass
class EntityListRow(models.Model):
"""
Each EntityListRow points to a PublishableEntity, optionally at a specific
version.
"""
entity_list = models.ForeignKey(EntityList, on_delete=models.CASCADE)
# This ordering should be treated as immutable–if the ordering needs to
# change, we create a new EntityList and copy things over.
order_num = models.PositiveIntegerField()
# Simple case would use these fields with our convention that null versions
# means "get the latest draft or published as appropriate". These entities
# could be Selectors, in which case we'd need to do more work to find the right
# variant. The publishing app itself doesn't know anything about Selectors
# however, and just treats it as another PublishableEntity.
entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT)
# The version references point to the specific PublishableEntityVersion that
# this EntityList has for this PublishableEntity for both the draft and
# published states. However, we don't want to have to create new EntityList
# every time that a member is updated, because that would waste a lot of
# space and make it difficult to figure out when the metadata of something
# like a Unit *actually* changed, vs. when its child members were being
# updated. Doing so could also potentially lead to race conditions when
# updating multiple layers of containers.
#
# So our approach to this is to use a value of None (null) to represent an
# unpinned reference to a PublishableEntity. It's shorthand for "just use
# the latest draft or published version of this, as appropriate".
draft_version = models.ForeignKey(
PublishableEntityVersion,
on_delete=models.RESTRICT,
null=True,
)
published_version = models.ForeignKey(
PublishableEntityVersion,
on_delete=models.RESTRICT,
null=True,
)
class ContainerEntity(PublishableEntityMixin):
"""
NOTE: We're going to want to eventually have some association between the
PublishLog and Containers that were affected in a publish because their
child elements were published.
"""
pass
class ContainerEntityVersion(PublishableEntityVersionMixin):
"""
A version of a ContainerEntity.
By convention, we would only want to create new versions when the Container
itself changes, and not when the Container's child elements change. For
example:
* Something was added to the Container.
* We re-ordered the rows in the container.
* Something was removed to the container.
* The Container's metadata changed, e.g. the title.
* We pin to different versions of the Container.
The last looks a bit odd, but it's because *how we've defined the Unit* has
changed if we decide to explicitly pin a set of versions for the children,
and then later change our minds and move to a different set. It also just
makes things easier to reason about if we say that defined_list never
changes for a given ContainerEntityVersion.
"""
# This is the EntityList that the author defines. This should never change,
# even if the things it references get soft-deleted (because we'll need to
# maintain it for reverts).
defined_list = models.ForeignKey(
EntityList,
on_delete=models.RESTRICT,
null=False,
)
# inital_list is an EntityList where all the versions are pinned, to show
# what the exact versions of the children were at the time that the
# Container was created. We could technically derive this, but it would be
# awkward to query.
#
# If the Container was defined so that all references were pinned, then this
# can point to the exact same EntityList as defined_list.
initial_list = models.ForeignKey(
EntityList,
on_delete=models.RESTRICT,
null=False,
)
# This is the EntityList that's created when the next ContainerEntityVersion
# is created. All references in this list should be pinned, and it serves as
# "the last state the children were in for this version of the Container".
# If defined_list has only pinned references, this should point to the same
# EntityList as defined_list and initial_list.
#
# This value is mutable if and only if there are unpinned references in
# defined_list. In that case, frozen_list should start as None, and be
# updated to pin references when another version of this Container becomes
# the Draft version. But if this version ever becomes the Draft *again*
# (e.g. the user hits "discard changes" or some kind of revert happens),
# then we need to clear this back to None.
frozen_list = models.ForeignKey(
'Container',
on_delete=models.RESTRICT,
null=True,
default=None
)
openedx_learning/apps/authoring/selectors/models.py
from django.db import models
from ..publishing.model_mixins import PublishableEntity, PublishableEntityVersionMixin
from ..containers.models import EntityList
class Selector(PublishableEntityMixin):
"""
A Selector represents a placeholder for 0-N PublishableEntities
A Selector is a PublishableEntity.
We'll probably want some kind of SelectorType hierarchy like we have
for ComponentType. But it's not necessarily exclusive–I haven't really
decided whether something can be both a Component *and* a
Selector, and doing so _might_ be convenient for shoving in XBlock
UI. In any case, I don't want to close the door on it.
A Selector has versions.
"""
pass
class SelectorVersion(PublishableEntityVersionMixin):
"""
A SelectorVersion doesn't have to define any particular metadata.
Something like a SplitTestSelectorVersion might decide to model its children
as Variants, but that's up to individual models. The only thing that this
must have is a foreign key to Selector, and Variants that point to it.
"""
selector = models.ForeignKey(Selector, on_delete=models.RESTRICT)
class Variant(models.Model):
"""
A SelectorVersion should have one or more Variants that could apply to it.
Variants could be created and stored as part of content (e.g. two different
A/B test options), or a Variant could be created on a per-user basis–e.g. a
randomly ordered grouping of ten problems from a set of 100.
We are going to assume that a single user is only mapped to one Variant per
Selector, and that mapping will happen via a model in the ``learning``
package).
"""
entity_list = models.OneToOneField(EntityList, on_delete=models.RESTRICT, primary_key=True)
selector_version = models.ForeignKey(SelectorVersion, on_delete=models.RESTRICT)
openedx_learning/apps/authoring/units/models.py
from django.db import models
from ..publishing.model_mixins import PublishableEntity, PublishableEntityVersionMixin
class Unit(models.Model):
"""
A Unit is Container, which is a PublishableEntity.
We'll probably have a Mixin for this.
"""
class UnitVersion(models.Model):
"""
A UnitVersion is a ContainerVersion, which is a PublishableEntityVersion.
We'll probably have a Mixin for this.
"""
# Not sure what other metadata goes here, but we want to try to separate things
# like scheduling information and such into different models.
Right now, this is sketched out with containers
being downstream of publishing, i.e. containers
has the publishing
app as a dependency, but publishing
is unaware of containers
. There is a complication with this in that containers
will want to hook into the publishing process in two ways:
frozen_list
of a ContainerVersion
when the draft version of a container is updated. This is largely to accommodate (1), but can happen in other situations as well. This can happen if we reset everything to the published version–i.e. "discard changes", which is a publishing operation and not one that we'd call into containers
code for directly.PublishLog
so that we can record the affected Containers, and possibly follow this relationship up recursively.Most of these edge cases are to avoid the case where we have an unpinned reference from a Container and expect a published version of a child element to exist, but find that it does not.
The thing is that we could try to accommodate this by moving containers entirely into publishing. And it's possible that would be clearer/more concise. But in general we've tried to make apps really narrowly scoped, and I think that's been generally good for reasoning about things. I also suspect that publishing
and containers
with both grow quickly to accommodate new use cases, and that other applications will be interested in having publishing hooks anyway.
But in any case, something to talk about in the ADR (or multiple ADRs if we want to split this out).
We're going to eventually need a model that maps Learners to a unique(Selector + Variant), maybe with version information. That doesn't belong in the authoring
package at all, so it likely implies a couple more apps like:
openedx_learning.apps.learning.learner
# for the base Learner
model 1:1 with Useropenedx_learning.apps.learning.compositor
# for composing things using selectorsThings we discussed at various points...
Modulestore content uses the idea of content groups and user partitions in order to limit what the user sees from a superset of content. For instance, a Component might be marked to be in the "licensed" content group that an institution may legally only show to its own on-campus students. The LMS checks to see if the user is in a user partition that has access to the "licensed" content group and filters it out if they are not.
While this approach works well when certain groups see a little more or less content than other groups, it requires us to encode every possible permutation of the components into a Unit. This becomes prohibitively expensive when there is a very large space of possibilities, e.g. "randomly pick 10 problems from a set of 100, in random order". The expectation is that we will see more dynamic behavior like this as learning content presentations built on Learning Core grow more sophisticated over time.
We still intend to use content groups for things like enrollment tracks.
We could have a much simpler structure to read from if we wrote out the container parent-child relationships for each user. In this scenario, every user would have their own customized set of rows for every Unit that had dynamic content.
The disadvantage of this approach is that it can use substantially more space, and it scales poorly when content changes. For instance, let's say that a Unit has a SplitTest with two different sets of Components. Then 100K view this content, resulting in half the users getting components [A1, A2, A3, A4, A5]
; and half getting [B1, B2, B3, B4, B5]
. We've now created 500K rows to encode that data, but there are only really two permutations. Furthermore, if the content authors then append A6
to the first list of Components, then we must either accept that the first group of 50K people is going to see outdated content, or we need to do some lazy or batch process to add those extra 50K rows that will add A6
to each Unit in the A group.
Instead of having a general model for capturing parent-child relationships, we could have different models for each relation type (Units, Subsections, Sections, etc.) and utilize some mixin for shared logic.
Having a centralized model makes it easier to write and maintain tricky common logic (like handling member deletion and publishing events), and makes it easier to make recursive queries.
One of the properties of this design that I would really like to keep and explore is the idea that Selectors can be both generalizable and performant. The backend code to run a SplitTestSelector
against two groups of Components and two groups of Units should be nearly the same, which could potentially give us a lot of power.
I'm not sure if this generalizes all the way up to truly adaptive, "show me the next thing" functionality. You could emulate that by constantly updating the Variant that a student sees and making new versions of Units, but it seems like it would waste a lot of space.
I described an early attempt at sketching pruning functionality here:
One of the assumptions of the pruning plan I had is that we could delete all content associated with a PublishableEntityVersion
(though cleaning up the underlying Content
would be harder). But the data model I show above complicates things because EntityLists
will have references to specific PublishableEntityVersions
. It probably means that cleanups would have to go top-down, i.e. start with the larger container types and gradually work their way down.
I hope there's a simpler way to think about this.
@ormsbee: I can't attend Friday's sync meeting, so I'll leave my updates here.
Here are some high-level decisions for the container capability based on what I've compiled from #38, this issue, and Dave's input in these last few days. I'm working on the container definitions first and then being more specific with units. The ADRs are far from being done, but I hope to add more details while implementing a POC. I'll let you know once I make further progress.
Thanks.
Here's where I've been implementing the model sketches and some API methods to test some of the operations on containers and units: https://github.com/openedx/openedx-learning/pull/254
I'll make sure to add a better cover letter to the draft PRs, both 254 and 251. Also, I'll continue working on the ADRs WIP sections :)
I'm still working on the implementation in #254 and have left a few comments there where I need some guidance. I've also been updating the ADR with a few comments for you to review: PR #251. Also, a review of the content is very welcome so we know we're on the right path!
While that moves forward, I'll start a discussion on publishing to gather input for the ADRs.
We currently have 3 ADRs addressing what was discussed #251:
Over the past couple of weeks, we've moved forward in discussing these high-level decisions, focusing on the proposal and its implications. In the meantime, I've opened the PR for review as we continue these discussions.
We need to write an ADR (or multiple ADRs) describing Learning Core's approach to modeling Units in particular and Containers in general.
A lot of this is informed by discussions in: