Closed ormsbee closed 1 month ago
We could model this sort of relationship explicitly in a Unit by making foreign key references to both the versioned and unversioned model and having a null value for the versioned field mean that we always grab the latest one via a join on PublishedComponent.
I'm leaning more towards this one. More specifically, something like:
class UnitVersionComponentVersion(models.Model):
unit_version = models.ForeignKey(UnitVersion, on_delete=models.CASCADE)
component = models.ForeignKey(Component, on_delete=models.RESTRICT)
component_version = models.ForeignKey(ComponentVersion, on_delete=models.RESTRICT, null=True)
order_num = models.PositiveIntegerField(null=False)
This would mean that we don't usually create new UnitVersions when Components are updated–only when the Unit itself changes. That will reduce the noise a lot when we're talking about containers (Units, Subsections, Sections, etc.) within the same LearningPackage. At the same time, we can fix to specific versions when using external resources. But I think this also lets us do the CCX use case where we want to reference external things that are constantly updating.
In other words, in the past I think we've been talking about who controls updates–the people creating library-style content or the people using that content in their courses. But I think if we have this pattern for composition, we can determine that policy independently for every container's contents.
@kdmccormick, @bradenmacdonald, @feanil
Wanted to capture some things that were discussed in an in-person whiteboarding session:
In the past, we've talked about creating a Unit in a Library, using it in a Course, but then making modifications in that course. In this use case, the course team probably doesn't want to leave an unpinned reference to the Unit because they'd have their own changes mixed in, and they'd want to control when changes made it out to their students.
One way we can do this is to make a shallow clone of the Unit, with some sort of back pointer to the original.
So let's say there's a Library Unit LU that a Course is adding to one of its subsections.
The Course creates a new Unit CU that has:
Modifying CU then becomes straightforward. New Components that are local to the Course can be added to CU in an unpinned way, while keeping the references to Library Components pinned to specific versions and updating them only when the author decides to do so. This also lets people remove Components from CU and replace them–for instance, an introductory text supplied by the library that is inappropriate in the context of the course.
One way in which the current data model is incomplete is that it doesn't provide an adequate way to represent a Unit where the content is user-dependent, for example:
We didn't really discuss possible solutions in any kind of detail. @jmakowski1123 suggested the terminology of "Unit Template" for the abstract concept of how the Unit is defined with those slots.
A more recent data model thought I had was that we could try to flatten these things out so that every UnitVersionComponentVersion join table row has:
unit_version
component_version
order_num # ordering of this thing within the Unit
content_group # content groups are both top level ones defined by authors, as well as implicit (e.g. randomization)
content_group_value
content_group_order_num
One interesting property this has that you can mix the content group content in different places in the Unit... I'm not sure if that's useful or just terribly confusing. I like that this can potentially be very fast to query. Things I don't like about it are:
Another approach is to have that level of indirection where UnitVersions have Slots, and there is a separate 1:M table that has ComponentVersions and group information. (I'll expand on that in another comment later tonight.)
The Slots approach might look like:
class UnitVersion(models.Model):
uuid = immutable_uuid_field()
unit = models.ForeignKey(Unit, on_delete=models.CASCADE)
version_num = models.PositiveIntegerField(null=False)
class UnitVersionSlot(models.Model):
uuid = immutable_uuid_field()
unit_version = models.ForeignKey(UnitVersion, on_delete=models.CASCADE)
order_num = models.PositiveIntegerField(null=False)
class UnitVersionSlotComponentVersion(models.Model):
uuid = immutable_uuid_field()
unit_version_slot = models.ForeignKey(UnitVersionSlot, on_delete=models.CASCADE)
variant = models.ForeignKey(Variant, on_delete=models.RESTRICT)
order_num = models.PositiveIntegerField(null=False)
component = models.ForeignKey(Component, on_delete=models.RESTRICT)
component_version = models.ForeignKey(ComponentVersion, on_delete=models.RESTRICT, null=True)
Slots can have 0, 1, or many ComponentVersions for an individual student.
Some quick (translation: disorganized) thoughts before I put this down for another day or two:
A more recent data model thought I had was that we could try to flatten these things out so that every UnitVersionComponentVersion join table row has ... content_group # content groups are both top level ones defined by authors, as well as implicit (e.g. randomization)
How does that data model work in the context of randomization? For the sake of argument, if I have a library with 100,000 components, and I want to randomly include three of them into the unit, how many UnitVersionComponentVersion
entries would need to exist in the unit? 1? 3? 100,000? 300,000? 9e14? (100,000×99,999×99,998)
Same question with the slots model.
I personally still prefer conceptually a composited outline approach where the "Unit" object (and higher objects) doesn't always directly store pointers to components but instead has a list of "rules" for how to build the unit - include this component version in position 0, then A/B test componentversion A and componentversion B in position 1, then randomly select 3 entries from LibraryVersion LV54 matching tag "difficult". So what we store at the database level is a list of rules, some of which have references to componentversions or libraryversions (learningpackageversion?). But the actual componentversions don't get resolved until the learner actually views the unit. Or perhaps they are different every time the learner views the unit if you allow dynamic rules for a duolingo style experience or something more adaptive. This also adapts really really well to the CCX case as explained in the link above, making it trivial to insert or delete components or units from the "template" course by simply appending course-specific rules to the rule list.
I believe this can be very performant if the "resolved" list for each learner is cached in the database, and only in the case if there's anything learner specific; where the rules are all simple, the resolved list can be cached for all learners.
BTW where we do want explicit references to componentversions, I do really like having a version field that can be null for "use latest" or filled in for "use specific". That also solves the product question with library content, if we allow authors to choose to pin the version or not at the time they use the content.
I haven't considered this space in as much detail as you folks, but at first pass, I like @bradenmacdonald 's rules-based approach. Back during BD-14, it's what I had originally assumed we would do when I heard of "unit composition" as an idea.
How does that data model work in the context of randomization? For the sake of argument, if I have a library with 100,000 components, and I want to randomly include three of them into the unit, how many
UnitVersionComponentVersion
entries would need to exist in the unit? 1? 3? 100,000? 300,000? 9e14? (100,000×99,999×99,998)
If the entire pool of possible options is 3, then the UnitVersionComponentVersion has 3 entries.
If the entire pool really is 100,000 and the students seeing this Unit can randomly see any of those 100,000 components in this unit, then the data model explodes–I can think of plausible encodings for either 100K or 300K per UnitVersion, but that's too much regardless. I'm not convinced this is a realistic use case though.
I do like the rules-based approach in principle. My main concerns with it are:
So for example, if we have a kind of Unit where the contents are randomly generated per-student by pluggable ruleset, I'd want to have some centralized model in the Learning Core to store the materialized view of this Unit for a given student. Something that will be guaranteed to be fast and not break if that particular ruleset is deprecated and removed (which is part of what motivated me down this rabbit hole).
The other question I have is how much dynamism we need in the Unit->Component relationship, and whether that sort of really-wide-open adaptive use case is more about the relationship between Sequence->Unit. We certainly need to cover current Unit->Component use cases as they relate to content library use in courses for the sake of backwards compatibility, but I wonder if it's okay to leave it relatively more constrained and let Sequences be completely wide open for rule manipulation (or creating multiple Sequence types, some of which are).
So for example, if we have a kind of Unit where the contents are randomly generated per-student by pluggable ruleset, I'd want to have some centralized model in the Learning Core to store the materialized view of this Unit for a given student. Something that will be guaranteed to be fast and not break if that particular ruleset is deprecated and removed (which is part of what motivated me down this rabbit hole).
+1
The other question I have is how much dynamism we need in the Unit->Component relationship, and whether that sort of really-wide-open adaptive use case is more about the relationship between Sequence->Unit. We certainly need to cover current Unit->Component use cases as they relate to content library use in courses for the sake of backwards compatibility, but I wonder if it's okay to leave it relatively more constrained and let Sequences be completely wide open for rule manipulation (or creating multiple Sequence types, some of which are).
I think this is a very good point. If Unit->Component dynamicism isn't important, than certainly we should keep the unit compositor simple & static, instead pushing that complexity up to the Sequence->Unit level.
I feel like we've asked product about this a few times, and IIRC we've heard back each time that units like this will be a common use case:
Which leads me to some questions:
have some centralized model in the Learning Core to store the materialized view of this Unit
+1, that's what I meant by
the "resolved" list for each learner is cached in the database
As for
I'm not convinced this is a realistic use case though.
I believe some problem libraries like Mastering Physics have on the order of tens of thousands of problems, and though instructors would likely never want to pull "three random problems" from the whole set, I could see them accidentally mis-configuring it and forgetting to apply a tag filter, so that there is some temporary state where such a huge number of problems is configured. We could definitely say that randomization is limited to a pool of 1,000 entries or something like that, to avoid the issue.
pluggable ruleset
It doesn't necessarily have to be pluggable. We could have only fixed core rulesets - static, A/B, random, library sourced, and adaptive, where adaptive is pluggable but with constraints and stability and materialization. But I guess having a whole bunch of core rulesets isn't that different from a pluggable API.
+1 to what Kyle's saying - I'm pretty sure we need dynamic randomization within units, perhaps with a limit on the number of options. But I do think that something like 100 is too low a limit for a MOOC; these days there are some huge open problem banks available and instructors may want to keep the number of students likely to have been assigned a similar problem quite low. If one had a MOOC with 5,000 students and a limit of only 100, that would mean for each problem there are 50 other students with the same problem, and cheating/copying/answer-sharing could easily occur. (Though maybe in the world where ChatGPT exists, none of this matters anymore... :/ )
I wonder if it's okay to leave it relatively more constrained and let Sequences be completely wide open for rule manipulation (or creating multiple Sequence types, some of which are).
I think that makes sense, if we say that the totally open crazy adaptive cases I mentioned are kept to the sequence level, and at the Unit level we only support limited randomization.
But I do still feel like it's sub-optimal to record all the potential random options into the Unit via UnitVersionComponentVersion
and materialize the learner-specific assignments, rather than just materialize the learner-specific assignments.
@kdmccormick:
I feel like we've asked product about this a few times, and IIRC we've heard back each time that units like this will be a common use case:
- a particular text or video component, followed by
- an interactive component, like a problem, selected from a random pool.
Which leads me to some questions:
- Should we push back on that, and assert that 1 and 2 should be spread across two units?
I don't think we should push back on that, especially since we can't do so without breaking backwards compatibility. Besides that, I think that it's entirely reasonable for authors to think that way and keep the two associated, particularly if they're choosing from 3-4 problems that were specifically made to fit into this Unit (which is a common use case at the moment).
At some point in the future, it might make sense to give an option to present the Units differently, like displaying one Component at a time, but even in that case, authoring them in the same conceptual Unit makes sense to me.
- Or, in the other direction, are there more complex use cases we want, which would warrant a unit-level rules system (rather than just a sequence-level rules system)?
None that I can think of, though @jmakowski1123 might be able to chime in better here.
- What's the largest pool size we'd be comfortable supporting if we were to go with the UnitVersionComponentVersion model here? 100,000 might feel unrealistic, but would even 1,000 or 100 be OK?
Definitely not 1000, probably not even 100. This would be the data model when there might be a potentially massive library you're borrowing from (e.g. millions of Components), but you as the author have decided that it needs to be one of ten or twenty.
Maybe the fundamental difference is between the author trusting the system to put something relevant to a tag/topic for the student, vs. having the author manually curate (and often author) the specific content that can appear in a place to teach or reenforce that specific concept.
@bradenmacdonald
I believe some problem libraries like Mastering Physics have on the order of tens of thousands of problems, and though instructors would likely never want to pull "three random problems" from the whole set, I could see them accidentally mis-configuring it and forgetting to apply a tag filter, so that there is some temporary state where such a huge number of problems is configured. We could definitely say that randomization is limited to a pool of 1,000 entries or something like that, to avoid the issue.
I'd probably cap it much more conservatively to start, like 20.
+1 to what Kyle's saying - I'm pretty sure we need dynamic randomization within units, perhaps with a limit on the number of options. But I do think that something like 100 is too low a limit for a MOOC; these days there are some huge open problem banks available and instructors may want to keep the number of students likely to have been assigned a similar problem quite low. If one had a MOOC with 5,000 students and a limit of only 100, that would mean for each problem there are 50 other students with the same problem, and cheating/copying/answer-sharing could easily occur. (Though maybe in the world where ChatGPT exists, none of this matters anymore... :/ )
This is another question for @jmakowski1123, but even with massive problem banks, I don't think course authors are expecting to have 100+ problems that fit exactly into each particular Unit. There are hundreds of places for these in a decent sized course, meaning that we'd be talking about tens of thousands of source problems, and that's a lot of content to author. Never mind ensuring the fairness of grading when the pool of questions becomes too large to be practically reviewable by the course team.
In terms of too many folks getting the same problems, courses would also lean a bit on in-problem randomization to help mitigate that.
I wonder if it's okay to leave it relatively more constrained and let Sequences be completely wide open for rule manipulation (or creating multiple Sequence types, some of which are).
I think that makes sense, if we say that the totally open crazy adaptive cases I mentioned are kept to the sequence level, and at the Unit level we only support limited randomization.
Yeah, that's what I'm thinking. The Sequences need that kind of craziness to support adaptive use cases, but I want to keep the Units relatively simpler/static (while still addressing current use cases) if possible.
But I do still feel like it's sub-optimal to record all the potential random options into the Unit via
UnitVersionComponentVersion
and materialize the learner-specific assignments, rather than just materialize the learner-specific assignments.
There's still the export use case. I also want to think through the materialized thing a bit more because a lot of in-Unit content visibility is a function of content group assignments, which can change either because of content changes or user reassignment. Which we can re-check each time, but if we're doing that, there's not much gained by materializing that data for individual students.
At a higher level, I suspect this is an issue where we have at least two very different families of use cases and using the same words might be tripping us up. Course Units have to have a certain base level of dynamic behavior in order to support features currently used in edx-platform. We also can't stop people from making Course Units that have a dozen different problems in them and act more like we'd expect subsections to.
But when we're considering Units for other Modular Learning Use case, I think that we can craft Units that are more constrained and more easily stand alone. Maybe not as hard constraints, but in terms of guidelines for how we think they should be used.
FWIW, I was mulling this over this past weekend and I've come around to the idea of having a more dynamic compositor rather than my initial proposal of having all the possibilities encoded and selecting a subset of them. The thing that finally tipped me over was that randomization doesn't just give you a random item, but a potentially reordered subset, meaning that it wouldn't make sense to statically encode the ordering and show a few of them, even in that simple use case.
I do still have a lot of concerns about how we specifically encode these in the data model so that the representation is compact, versioned, performant, and so that content changes propagate reasonably to saved user state (e.g. the list of components in the A/B test branch for this user were modified). I'm also still not convinced that "any one of 100,000 items could end up in this slot" is a use case that we should worry about at this layer, and that doing so would make it much harder to version efficiently.
The thing that finally tipped me over was that randomization doesn't just give you a random item, but a potentially reordered subset, meaning that it wouldn't make sense to statically encode the ordering and show a few of them, even in that simple use case.
Yeah, even just a single UnitVersionSlot with an ordered set of 3 component from a pool of 20 would be 3 P 20
, ie 6840 potential UnitVariants.
(e.g. the list of components in the A/B test branch for this user were modified)
Agreed that we need to carefully think about how changes to live content (however ill-advised they are) are handled in our model. If you haven't already, I recommend skimming LibraryContentBlock.make_selection, which meticulously steps through all the ways that the pool of components can change.
@ormsbee I did some timeboxed whiteboarding on this. Here's what I came up with so far:
##### AUTHORING-SIDE MODELS.
##### Note that there is no direct Unit<->Component connection on this side;
##### it's always Unit<->Slot<->Component.
class Unit(PubEntity):
...
class UnitVersion(PubEntityVersionMixin):
...
class Slot(Model):
"""
A Unit is made of (version-agnostic) Slots.
Certain student state may hang off of a Slot: random seed, bucket #, etc.
The slot_kind tells the unit compositor how to "fill" the slot with components, e.g.:
* 'static' -> By far the most common case -- just a single component. Could raise an
error if there are multiple components mapped to this.
* 'random_pool'
* 'split_test'
* 'conditional'
* (plugins could register their own slot_kinds)
"""
unit = ForeignKey(Unit)
key = SlugField() # used to build the usagekey for student state
slot_kind = CharField()
class SlotVersion(Model):
"""
Puts a Slot into a version of unit, with a position.
Particular slot_kinds may hang content information off of this.
For example, a RandomSlotVersion would define the num_components_to_pick.
"""
slot = ForeignKey(Slot)
unit_version = ForeignKey(UnitVersion)
order_num = Integer()
class ComponentVersionSlotVersion(Model):
"""
Map a version of a component to a version of a slot.
For slot_kind=='static', we expect exactly 1 of these to exist per SlotVerison.
For other slot_kinds, there may be 0-N, for some reasonable max N.
"""
slot_version = ForeignKey(SlotVersion)
component_version = ForeignKey(ComponentVersion)
##### LEARNING-SIDE MODELS.
class RenderedUnit(Model):
"""
A realized UnitVersion with all slots filled.
Upon publish, the unit compositor will generate as many of these as possible.
For fully static units, that's one RenderedUnit per UnitVersion.
For units with only low-permutation slots (eg, split_test), we could pre-render
all RenderedUnits per UnitVersion.
For units with high-permutation slots (eg, random_pool), we would allow RenderedUnits
to be generated on-demand at learning time.
"""
unit_version = ForeignKey(UnitVersion)
class RenderedUnitForUser(Model):
user = ForeignKey(User)
rendered_unit = ForeignKey(RenderedUnit) # we could allow NULL to mean "all users", for static units
class ComponentVersionInRenderedUnit(Model):
"""
This ComponentVersion belongs to a this RenderedUnit, with a position.
"""
rendered_unit = ForeignKey(RenderedUnit)
component_version = ForeignKey(ComponentVersion)
order_num = Integer()
Do we need to support randomization, split test, conditional, [and library content?] at the section/subsection level?
Straw man alternative proposal. I don't think this is better but it demonstrates how to model each level of the hierarchy using similar mechanisms and uses a JSON field to reduce the number of JOINs required. I believe it's possible to make the database verify the JSON field reference constraints specified at the time of transaction commit, but I'm not sure.
class OutlineLevel(PubEntity):
""" A single level (e.g. a subsection) of a course outline """
class OutlineLevelVersion(PubEntityVersionMixin):
""" A particular version of a single level (e.g. unit) of the course outline """
title = CharField()
type = CharField() # section, subsection, unit
structure = JSONField(example="""
[
{"child_type": "static", "refs": ["unit1_ref"]},
{"child_type": "static", "refs": ["unit2_ref"]},
{
"child_type": "randomization",
"refs": ["unit3a_ref", "unit3b_ref"],
"state_uuid": "...",
"num_components_to_pick": 1
}
]
""")
class OutlineEntityRef:
"""
A reference to a particular child PublishableEntity (Component or OutlineLevel
[unit/subsection/section]) used in the given OutlineLevelVersion. If the JSON
structure field references a child, this relationship MUST also exist. Conversely,
it is forbidden to create this relationship if the entity in question is not referenced
in that version of the JSON structure field.
"""
entity_id = ForeignKey(PubEntity)
used_in = ForeignKey(OutlineLevelVersion)
(I updated my proposal to move the bulk of the work from render-time up to publish time)
@bradenmacdonald:
Do we need to support randomization, split test, conditional, [and library content?] at the section/subsection level?
I think that would be ideal, if we can preserve all our other requirements and not add too much complexity. I'm not sure if it's feasible.
I believe it's possible to make the database verify the JSON field reference constraints specified at the time of transaction commit, but I'm not sure.
I'm not aware of anything in Django to do this, and while PostgreSQL has fancy JSON tooling, I don't think MySQL gives any more than schema validation.
@kdmccormick: I like where you're going with your models. I think the relationship between Slots and Units is especially tricky, and I have a bunch of questions in my head about how that should play out. Like:
SlotPublishableEntity
? Typed model per thingy that can go in a Slot? Would it make any sense for a Slot to give you a heterogenous list of things (Seq -> Unit -> Seq)?Okay, a few more thoughts after having slept on it...
Briefly ignoring versioning models, the hierarchy would look something like: Unit -> Slots -> SlotVariants -> Components
So a split test defines two SlotVariants, one for each possibility. Just like in @kdmccormick's example, those items with low numbers of variations generate those as part of the authoring process. But some things like Randomize would generate a SlotVariant on-the-fly, and map a specific user to it.
Using SlotVariants could potentially help us localize changes better–so that we don't have to re-bake a bunch of Units for students when making changes to a static piece, just because there's also a randomized slot in there somewhere that forced the whole Unit to be rendered per-user. It might also just be a convenient way for these types of modules to model their data anyway.
I'll try to sketch some proper models and relations for this later today.
Using SlotVariants could potentially help us localize changes better–so that we don't have to re-bake a bunch of Units for students when making changes to a static piece, just because there's also a randomized slot in there somewhere that forced the whole Unit to be rendered per-user. It might also just be a convenient way for these types of modules to model their data anyway.
Good call 👍🏻
I've been conflating dynamic-as-in-child-selection with dynamic-as-in-content-groups, and it might be simpler to model those separately, since content groups can overlap in combinations, while child selection does not.
Good call-out.
Do we need to support randomization, split test, conditional, [and library content?] at the section/subsection level?
Could Slots even even be independent of Components (going to @bradenmacdonald's question earlier about it applying to other structures)? SlotPublishableEntity? Typed model per thingy that can go in a Slot? Would it make any sense for a Slot to give you a heterogenous list of things (Seq -> Unit -> Seq)?
I'm hung up on these questions currently. At risk of falling into everything-is-an-XBlock trap, I am intrigued by the idea of a "Slot" being a sort of universal connector between any two publishable entities.
The question I keep coming back to is this: Is there something special about the Unit<->Component level of the hierarchy that makes it so Unit composition should be separate from the general "Outline" composition system? The three things I can think of are:
Okay, took a rough stab at it. Please see comments for stream-of-consciousness on this stuff.
class Unit(PublishableEntityMixin):
pass
class UnitVersion(PublishableEntityVersionMixin):
unit = models.ForeignKey(Unit, on_delete=models.RESTRICT)
class Slot(PublishableEntityMixin):
# Some kind of type information here for dispatch purposes.
# Maybe helpful to build out an example of a type of Slot, e.g. a
# SplitTestSlot that is 1:1 to this and has specific metadata related to
# SplitTests? Or is it enough to just make SplitTestSlotVersion?
pass
class SlotVersion(PublishableEntityVersionMixin):
slot = models.ForeignKey(Slot, on_delete=models.RESTRICT)
class SlotVariant(models.Model):
"""
Should a SlotVariant always be tied to a specific SlotVersion? Or maybe
decoupled into a M:M relationship like how Components and Content work?
Going M:M probably gives us more flexibility in the long term to do Slots
that don't necessarily use Components...?
"""
slot_version = models.ForeignKey(SlotVersion, on_delete=models.RESTRICT)
class SlotVariantComponentVersion(models.Model):
slot_variant = models.ForeignKey(SlotVariant, on_delete=models.RESTRICT)
order_num = models.PositiveIntegerField()
component = models.ForeignKey(Component, on_delete=models.RESTRICT, null=True)
component_version = models.ForeignKey(ComponentVersion, on_delete=models.RESTRICT, null=True)
class UnitVersionRow(models.Model):
"""
A row in a Unit can be either a single Component or a Slot that could expand
to an arbitrary number of Components (or zero).
This means that we don't have to create a separate, versioned Slot with its
own identifier when we're just adding Components statically–which is going
to be by far the most common mode.
"""
unit_version = models.ForeignKey(UnitVersion, on_delete=models.RESTRICT)
order_num = models.PositiveIntegerField()
# Simple case would use these fields with our convention that null versions
# means "get the latest draft or published as appropriate".
component = models.ForeignKey(Component, on_delete=models.RESTRICT, null=True)
component_version = models.ForeignKey(ComponentVersion, on_delete=models.RESTRICT, null=True)
# More complex case would use these two fields.
slot = models.ForeignKey(Slot, on_delete=models.RESTRICT, null=True)
slot_version = models.ForeignKey(SlotVersion, on_delete=models.RESTRICT, null=True)
@ormsbee That all looks reasonable to me.
UserSlotVariant
model? Or do we leave it to individual implementations (SplitTest, Randomized, etc) to save that user state somewhere? Either way, when a new SlotVersion is published, we need some way to recover the old user->variant mapping, adapt it, and save the new mapping.Say I have a "select 3 random components" Slot , with 10 components in the pool. I add an 11th. That creates a new SlotVersion, right?
Yes.
How do users get mapped to slot variants? A UserSlotVariant model? Or do we leave it to individual implementations (SplitTest, Randomized, etc) to save that user state somewhere?
I was thinking a UserSlotVariant
model that's centrally controlled, and that individual implementations get to write to.
Either way, when a new SlotVersion is published, we need some way to recover the old user->variant mapping, adapt it, and save the new mapping.
I'm not sure. If the SlotVersion
is not pinned to a specific version of a Component, but is instead always giving the latest published, then editing any individual Component (the most common use case) will work fine. In the scenario where we add an 11th item into the pool where the user has already selected 3, with the following steps:
SlotVersion 1
is made with 10 items to randomly select from. The RandomizedSlotVersion
extension figures out how the list of choices is represented, which we'll ignore for now.RandomizedSlotVersion
creates a SlotVariant A
specifically for this user with the three randomly chosen items in some randomly shuffled order, and adds an entry linking the two together with UserSlotVariant
(via some API).SlotVersions
because we're saying "just use the latest version" when we defined SlotVersion 1
.SlotVersion 2
is created.I'd argue that keeping the UserSlotVariant
pointing at SlotVariant A
(which in turns points to the SlotVersion 1
it was derived from) is actually the right thing to do, and it makes things much easier to reason about.
Though one thing that I don't have a good answer for in this scenario is "What happens if you delete a component?".
What does it look like if we generalize this to the whole outline?
I'll spin my wheels on that this evening.
@kdmccormick: FWIW, I think you were right when you were highlighting how this fundamentally differs from other potential modes of dynamic content at the sequence/section level because we display things all at once in the Unit. I tried sketching a couple of things that used PublishableEntities
directly, but I think the data model for Units is more predictable and sane if it's strongly typed to Components
specifically, and not a special case.
Another thing I was thinking about over the weekend was that Slots can present their own titles and UIs to the user. I was wondering if that means we should think of them as a type of Component, just one that has Slots (which could then map onto children() for those types of XBlocks). Then we'd have two separate ways to query the the Unit: by top level Components, or flattened out to all Components–with the understanding that we don't allow nesting beyond that.
It's really half-baked, and I'm still leaning against it (i.e. to keep Slots as a first class concept at the Unit level). But it's a possibility that I thought I should mention in case it leads to anything.
I think the data model for Units is more predictable and sane if it's strongly typed to Components specifically, and not a special case.
Though I will note that in the proposed model, Slots
are pretty much a standalone concept that could go in its own app (SlotType, Slot, SlotVersion, SlotVariant), with SlotVersionComponentVersion
being a concept that lives in the components
app. That gives us some flexibility to declare Slots-of-other-things later, if that turns out to be a reasonable thing to do.
One other useful aspect of this data model is that the Slots stuff is supplemental–if we remove all of the Slots-related models and references to it, then we end up in a place where UnitVersionRow
is just mapping UnitVersion
and ComponentVersion
with ordering. So we wouldn't have to block on the slots stuff for basic Unit composition functionality, and then add them later in a migration with default null values (which is what they would be most of the time anyway).
@ormsbee
Another thing I was thinking about over the weekend was that Slots can present their own titles and UIs to the user. I was wondering if that means we should think of them as a type of Component, just one that has Slots (which could then map onto children() for those types of XBlocks). Then we'd have two separate ways to query the the Unit: by top level Components, or flattened out to all Components–with the understanding that we don't allow nesting beyond that.
Good food for thought. I also lean against it because I'm somewhat attached to the Components-Are-Always-The-Leaf-Nodes idea, but maybe that's worth rethinking.
EDIT: One nice thing about the two-ways-to-query-the-Unit idea is that it maps more closely to how authors will experience the platform. Studio won't show them that their Units are made of "Slots"... they'll be made of "components". It's just that some of those "components" (the slotty ones) will flatten out into more components when presented in the LMS.
with the understanding that we don't allow nesting beyond that.
...unless we decide one day that we want CAPA responses to be components within the ProblemBlock component. But we'd never do that, right?
Though I will note that in the proposed model, Slots are pretty much a standalone concept that could go in its own app (SlotType, Slot, SlotVersion, SlotVariant), with SlotVersionComponentVersion being a concept that lives in the components app. That gives us some flexibility to declare Slots-of-other-things later, if that turns out to be a reasonable thing to do.
Good point. This would be nice for iterative development.
Here's another riff of the data model, which (I think) would allow it to model a flexible tree outline. I know we're talking about having a more restrictive Unit compositor, but I figured I'd post this as a strawman.
class Container(models.Model):
"""
This model essentially just marks a PublishableEntity as a container which can have members (below).
We could also hang any version-agnostic, generic access control settings off of it.
I did not see a need to make a ContainerVersion model, as it seemed redundant with UnitVersion,
SequenceVersion, etc.
"""
# Types of containers...
class Unit(PublishableEntityMixin):
container = models.OneToOneField(Container)
class UnitVersion(PublishableEntityVersionMixin):
unit = models.ForeignKey(Unit)
class Sequence(PublishableEntityMixin):
container = models.OneToOneField(Container)
class SequenceVersion(PublishableEntityVersionMixin):
sequence = models.ForeignKey(Sequence)
# ... and so on
class Slot(PublishableEntityMixin):
# Some kind of type information here for dispatch purposes.
pass
class SlotVersion(PublishableEntityVersionMixin):
slot = models.ForeignKey(Slot, on_delete=models.RESTRICT)
class SlotVariant(models.Model):
slot_version = models.ForeignKey(SlotVersion, on_delete=models.RESTRICT)
class SlotVariantMember(models.Model):
slot_variant = models.ForeignKey(SlotVariant, on_delete=models.RESTRICT)
order_num = models.PositiveIntegerField()
member = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT, null=True)
member_version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT, null=True)
class ContainerMember(models.Model):
"""
A row in a container can be either a single PublishableEntity or a Slot that could expand
to an arbitrary number of PublishableEntities (or zero).
This means that we don't have to create a separate, versioned Slot with its
own identifier when we're just adding PublishableEntities statically–which is going
to be by far the most common mode.
"""
container = models.ForeignKey(Container, on_delete=models.RESTRICT)
container_version = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT)
order_num = models.PositiveIntegerField()
# Simple case would use these fields with our convention that null versions
# means "get the latest draft or published as appropriate".
member = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT, null=True)
member_version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT, null=True)
# More complex case would use these two fields.
slot = models.ForeignKey(Slot, on_delete=models.RESTRICT, null=True)
slot_version = models.ForeignKey(SlotVersion, on_delete=models.RESTRICT, null=True)
@kdmccormick: Some thoughts/reactions:
That's definitely a powerful data model. We could mix and match anything, since the container's children are not restricted to any particular type. The fact that the container
is a foreign key means that if we wanted to, we could make a Unit that's also a Sequence. Though I think it's fair to say that we could/should do that sort of checking in the app layer... I've tried to keep Learning Core models stricter at the database layer and not rely as much on app logic for correctness guarantees, though we definitely to rely on the app layer in other places too.
The compelling things about this direction for me:
publishing
as a dependency.I could also see us using abstract models to parameterize the concrete models (if we want to split things up), or modeling the parent-child relationships in one concrete model, but using proxy models to narrow it down by the parent-type.
My biggest worries about this approach:
I wonder if it would make sense to try to separate the simpler case of parent/child relationship mapping from the more complex Slot mechanism, so that someone could model a static thing more simply... though I guess that's kind of moot if we're using the same model to hold all those relations.
I'll shift my unit prototype to use some variant of this approach with a separate containers app.
It unnecessarily complicates the code and confuses people, especially if it turns out we only use it for Units.
That would be an unfortunate outcome 😛 If we have a model just for Units, I'd much prefer one of the more-concrete models you proposed above.
I had this random late-night thought that we do have a potential use case for a Container of heterogenous types, and that's a CourseRun's content. If we're modeling multiple CourseRuns within the same LearningPackage, then it's reasonable to have some container at the root of each CourseRun. If ContainerMember.order_num
is nullable, we could have a mix of one ordered set of things in a container (the Sections), in combination with any unsorted set of content (e.g. static tabs, about text, and all that other stuff that's detached from the root CourseBlock but still part of the course run).
@kdmccormick: I'm starting to play with this variant of your more centralized container strawman. I tried to simplify/collapse the models a bit, and ended up with this:
# publishing app...
class Container(models.Model):
"""
Containers are a common structure to hold parent-child relations.
Containers are not PublishableEntities in and of themselves. That's because
sometimes we'll want the same kind of data structure for things that we
dynamically generate for individual students (e.g. SlotVariants). Containers
are anonymous in a sense–they're pointed to by specific kinds of
PublishableEntityVersions rather than being looked up by their own
idenitifers.
"""
pass
class ContainerMember(models.Model):
"""
Each ContainerMember points to a PublishableEntity, optionally at a specific
version.
"""
container = models.ForeignKey(Container, on_delete=models.RESTRICT)
order_num = models.PositiveIntegerField(null=True)
# Simple case would use these fields with our convention that null versions
# means "get the latest draft or published as appropriate". These entities
# could be Slots, in which case we'd need to do more work to find the right
# variant.
entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT, null=True)
entity_version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT, null=True)
# slots app...
class Slot(PublishableEntityMixin):
"""
A Slot represents a placeholder for 0-N PublishableEntities
A Slot is a PublishableEntity.
A Slot has versions.
"""
pass
class SlotVersion(PublishableEntityVersionMixin):
"""
A SlotVersion doesn't have to define any particular metadata.
Something like a SplitTestSlotVersion might decide to model its children as
SlotVariants, but that's up to individual models. The only thing that this
must have is a foreign key to Slot, and SlotVariants that point to it.
"""
slot = models.ForeignKey(Slot, on_delete=models.RESTRICT)
class SlotVariant(models.Model):
"""
A SlotVersion should have one or more SlotVariants that could apply to it.
SlotVariants could be created and stored as part of content (e.g. two
different A/B test options), or a SlotVariant could be create on a per-user
basis–e.g. a randomly ordered grouping of ten problems from a set of 100.
We are going to assume that a single user is only mapped to one SlotVariant
per Slot, and that mapping will happen via a model in the ``learning``
package).
"""
container = models.OneToOneField(Container, on_delete=models.RESTRICT, primary_key=True)
slot_version = models.ForeignKey(SlotVersion, on_delete=models.RESTRICT)
# units app...
class Unit(PublishableEntityMixin):
"""
A Unit is a PublishableEntity
"""
class UnitVersion(PublishableEntityVersionMixin):
"""
A UnitVersion has a Container
"""
container = models.OneToOneField(Container, on_delete=models.RESTRICT)
This model is still incomplete though, because there are certain versioning issues that we want the publishing app to know how to do (e.g. force a new version to be created if we're deleting a child element).
Actually, thinking on that for a bit, I think it means I want to put container
as a nullable OneToOneField
on PublishableEntityVersion
. And then have things that extend PublishableEntity
/PublishableEntityVersion
declare whether they are or aren't containers... which feels like I'm walking down a slippery slope, but I feel like that's worth it for centralized handling of some weird edge cases.
Okay, I kept sketching this out more, and a few thoughts:
containers
app. I didn't want to do this originally because it means that to really work well, we'd have to define some kind of draft/publish callback pipeline so that containers can know to update themselves when their children are deleted.ContainerEntityVersion
(i.e. the publishable thing) might have two or three primitive containers associated with it–the initial_container
that has pinned versions for everything at the time it was created, a defined_container
about what the author actually specified for it (either pinned or unpinned), and a frozen_container
which holds the locked versions when a new version is created (in reaction to items getting deleted, for instance).Still some holes in this, but it feels like this direction is feasible...
The latest version of this is being captured in #240 and I'm closing this Issue in favor of that one.
A Unit can have Components that are both fixed to a particular version (e.g. borrowed content from a Library), as well as references that should always point to the latest version of a Component (e.g. a Component in the same course). This pattern repeats itself at different scales (e.g. CCX courses), where sometimes we want to only update our version of borrowed content explicitly vs. always grabbing the latest published version.
We could model this sort of relationship explicitly in a Unit by making foreign key references to both the versioned and unversioned model and having a null value for the versioned field mean that we always grab the latest one via a join on PublishedComponent.
Or maybe we do always explicitly create a new version of a Unit whenever one of its child Components updates, and we keep a flag as to whether to auto-update or to lock to a specific version on a per-Component basis? We'd then hook into the publish workflow to publish the new version of the Unit along with the Component?