openedx / platform-roadmap

Tracking the maintenance, enhancement, and advancement of the Open edX project.
11 stars 1 forks source link

Learning Core Platform Arch Discovery #67

Open ormsbee opened 2 years ago

ormsbee commented 2 years ago

Extract core learning concepts and data models into a new openedx-learning repository, with goal of creating a new, scalable core platform for learning innovation.

Goals

  1. Refactor the Open edX LMS to enable more dynamic behavior in a scalable way. New features like V2 Content Libraries and Effort Estimation need a more scalable implementation than what exists in courseware today.
  2. Provide an easier, more reliable path for extensions developers. Extensions authors should be able to build and test against a smaller repo than edx-platform.
  3. Accelerate monolith breakup. By delivering incremental improvements to extensibility and performance, we have more reason to work on monolith breakup than our previous "extract all the non-core things" approach, where the benefits were more back-loaded.
  4. Improve separation of edX business concepts from the learning platform. No edX-specific logic would come over to the new repo.
  5. Advance the decoupling of Studio and the LMS. Part of this work includes building a new core content data model for the LMS, which would replace the current shared model with Studio.
  6. Create a lightweight foundation for entirely new learning experiences. This would allow groups like LabXchange to make use of helpful infrastructure and concepts, without having to simultaneously inherit all the technical debt and weight of edx-platform.

Major Components

The high level components would include:

Composition

What permutation of a single unit does a user see? This would handle things like A/B tests, randomized problem selection, disabling content by enrollment type, adding staff-only debug markup, etc. There should be multiple backends for what can render a Unit, with the XBlock runtime being one of those.

Navigation

Sequence and Unit metadata, outlines, etc. How do you get to a particular piece of content you need to learn from? This would pull in parts of the Learning Sequences API from edx-platform.

Partitioning

Low level utility that helps determine what users are in what groups for the purposes of A/B testing, enrollment tracks, etc. Used by both Composition and Navigation.

Policy

Content-related settings as they apply to the site as a whole, organizations, and individual courses. This covers a lot of what Course Overviews and course override waffle flags do today.

Publishing

Centralized list of Learning Contexts (e.g. Courses, Libraries), their published versions, and various content-related errors and warnings are associated with them. We need this to help us tackle the major issue we have today around publishing: it's an asynchronous process with many different components that sometimes take minutes to complete and may fail independently, leading to a mixed-published state. This is the most foundational component that others will be built on top of.

Scheduling

Lower level library for content scheduling information, likely pulling in most of what is edx-when today.

Discovery/Design Phases

Full Implementation Phases

Implementation Strategies

The following are some high level approaches/considerations with this new project.

Focus on content first.

Content data that lives in Studio is easy to re-build and backfill into new apps. User data is up to five orders of magnitude larger, and involves much larger challenges in terms of data migration.

Build Extensible Primitives

LearningContexts are a generic term that applies to Courses, Content Libraries, Learning Pathways, and any number of other collections of content that we want to discretely version and publish. We can centrally define logic around these, while leaving it up to higher layers to model specific types of LearningContexts in a pluggable way.

For instance, it makes sense for there to be a table of Courses, that have a foreign key to the LearningContexts table, which holds course-specific metadata. That table would have course-specific fields, and may even have a null learning context in the beginning (before any content is created).

This kind of arrangement would lead to a three layered system:

  1. Foundational primitives and logic, implemented only in openedx-learning.
  2. Specific implementations of those components to define different learning experiences, e.g. "Two-level hierarchical navigation", "randomized user partitioning scheme". These could be implemented in openedx-learning or outside. The most popular and useful ones could get folded into the repo over time.
  3. Business-level plugins that tie into systems defined in (2), such as EnrollmentTrackOutlineProcessor. These would be implemented outside the openedx-learning repo (many would live in edx-platform).

Implement Plugins in edx-platform

We once attempted to lift the ModuleStores out of edx-platform and ran into a rat's nest of dependencies that made the task extremely difficult. My thought with things like this is to have plugin interfaces that go the other way from what we usually do–where the core framework logic is in apps in this new repo, and the little plugin objects are created in edx-platform (and optionally elsewhere as well).

So to use a concrete example, say we migrate the learning_sequences app in edx-platform today to become part of the navigation app in this new repo. The navigation app will then have the concept of OutlineProcessors–an object interface for different concerns that have to modify the set of things you can see or access in a course outline. The navigation app would have the logic for running OutlineProcessors, reading those values from a list defined in Django settings. The EnrollmentOutlineProcessor would be defined in edx-platform (and likely have imports to a bunch of things also in edx-platform), and then be specified in the settings file.

By doing this, we can keep some of the crazy logic and dependencies in edx-platform–it would allow us to move things more incrementally, without taking huge risks.

See Also

Original post:

jmakowski1123 commented 2 years ago

Explore relation to: https://github.com/openedx/platform-roadmap/issues/2

ormsbee commented 2 years ago

I created a scratch repo to test some of these ideas in (there is some hashing out of the data model for the core publishing app). A number of the structural ideas were adopted from a Django 2021 talk on Scaling Django to 500 Apps.

Some things that came up during the prototyping and discussion:

Phasing

Doing the Publishing portion of this is mandatory, because everything else builds off of it. After that, doing partitioning and composition is probably the best bet, since it's relevant for v2 content libraries and the effort estimation API.

App layout and conventions

Other interesting ideas to experiment with from "Scaling Django to 500 Apps"

Dependencies

Data modeling details

Lifecycle/Upgrade Considerations

ormsbee commented 2 years ago

Possible areas of exploration on top of this smaller core: