openedx / openedx-learning

GNU Affero General Public License v3.0
5 stars 8 forks source link

How do we handle XBlock field defaults? #158

Open ormsbee opened 5 months ago

ormsbee commented 5 months ago

This isn't needed for Redwood. I'm just jotting these ideas down while they're still fresh in my head.

Blockstore didn't store the default values for fields in the OLX, and relies on the XBlock runtime being able to fill in that data when it's loading the block. This makes things much cleaner for authoring, because most of those defaults aren't needed at any given time (many are hopelessly outdated), and would just clutter things up. It also makes it easy to change the default value later, if necessary.

The drawback is that relying on the XBlock runtime's notion of default values makes it much harder to run queries on block information without the involvement of the XBlock runtime. For instance, say I had a richer data model to capture ProblemBlock data, and I wanted to do an aggregate count by randomization type. Non-default values like "always" or "per_student" would show, but there would be no indication outside of the XBlock runtime that the most common, default value was "never".

Now we could make it so that the OLX preserves just what was explicitly overwritten, but the richer supplementary data models fully write out all the data. This makes querying easier, but it has a couple of drawbacks:

  1. It wastes a lot of space on redundant data.
  2. Though it is rare, field defaults can change.

We could represent the defaults as its own XBlockDefaultFields model that is 1:1 with ComponentType. It would have a simple JSONField for the default values, and the app with the Learning Core XBlock runtime could do startup init checks to see if any of the registered XBlocks have new values.

One weird caveat is that any richer data model that stores specific XBlock fields in a relational way would probably need to keep a separate set of boolean fields to indicate whether or not it was using the default. We can't just use a nullable field and have null be equivalent to "default", because a field could also just be explicitly set to null.

If we stored parsed XBlock field data as a JSON field, we could represent this by having missing keys. So then there would be one XBlockExplicitFields model that mapped 1:1 with ComponentVersion, and that would hold the explicitly set fields. Combining that with XBlockDefaultFields would give a relatively straightforward data model. Repeating all those field names over and over again is a little inefficient, but probably tolerable if we do a decent job of pruning unused versions. But the thing that would kill us is that Django's JSONField mapping into MySQL doesn't support proper indexing (and MySQL's just not that great at JSON indexing in general). So people who wanted to do richer queries on content data wouldn't have a great way to do it at the database layer anyway.