openedx / openedx-events

Open edX events from the Hooks Extensions Framework
Apache License 2.0
11 stars 18 forks source link

[Data] Create new CMS_COURSE_PUBLISHED signal to represent courses being updated in Studio #72

Closed rgraber closed 1 year ago

rgraber commented 2 years ago

Fields: blocks_url effort end enrollment_start enrollment_end id media: { banner_image: {uri, uri_absolute}, course_image: {uri}, course_video: {uri}, image: {raw, small, large} } name number org short_description start start_display start_type pacing mobile_available hidden invitation_only course_id

Note: CMS_COURSE_PUBLISHED is just a suggestion for a name and can be improved on. We should make sure we are consistent with "CMS" vs "Studio" nomenclature.

Open Questions: Where in the openedx-events module structure should this new event live? @ormsbee does tCRIL have a strong opinion on this?

ormsbee commented 2 years ago

@rgraber: Is this a new event that should fire specifically when course drafts are updated? Or does this mean something else? Imports?

ormsbee commented 2 years ago

Also:

  1. When is the signal sent?
  2. Who are the consumers of the signal?
  3. What do those consumers do with the signal data?
  4. Do consumers have to make separate requests to get more information?
  5. Do different consumers take different subsets of this data? Does it make sense to split it into multiple separate signals?
ormsbee commented 2 years ago

For instance, if this is intended to broadcast when Studio users change information that specifically affects catalog listings, it could be renamed to something that more specifically focuses on that scope.

Where in the openedx-events module structure should this new event live? @ormsbee does tCRIL have a strong opinion on this?

If it's triggered by a Studio action, it probably belongs in a separate authoring package, instead of being lumped together with learning events that are more LMS-centric.

robrap commented 2 years ago

Thanks @ormsbee. It looks like you also already discovered https://github.com/openedx/edx-platform/issues/30682, which includes a bit more context.

I think we can have a new signal that is more specific in scope, but I'll wait on Becca to return for those details.

In terms of naming, I'm wondering what language (or scoping) we use to handle cases when different bounded contexts use a different language. For example, according to https://openedx.atlassian.net/wiki/spaces/AC/pages/663224968/edX+DDD+Bounded+Contexts, "authoring" includes both Catalog Content Authoring and Course Run Authoring. If the event type doesn't differentiate, how do we know what language to use? Does course_id mean different things in these different contexts?

ormsbee commented 2 years ago

I feel like that page is a bit self-contradictory in that in tables it lumps catalog authoring along with course content authoring, but in the top level breakdown images, it puts Discovery/Catalog in an entirely different subdomain.

openedxddd-diagram2

My personal view is that the catalog is its own subdomain, and authoring should be scoped to the edits that happen in Studio (+import/export) to the learning content. I think that in the long term, we'd want a data flow that looks something like:

graph LR;
    A["Authoring<br>(Studio content)"]-->C["Catalog<br>(alter scheduling and logos)"]-->L["Learning<br>(what the student sees in LMS)"];

Though really, it'd be more like:

So scheduling information changes might involve three different views of the world with potentially three different signals as they correspond to three different sets of users, lifecycles, and receivers.

  1. A signal for when that course team authored content is edited in Studio. This would let the catalog know to update its information, as it will use whatever the Studio author set if it doesn't have overrides from marketing.
  2. A signal for when the catalog view of this changes, either because of Studio or because marketing changed something. This would let an external entity know when there's a change in course scheduling (they don't care where that change came from–catalog is the source of truth for that as far as they're concerned).
  3. A signal for when the LMS view of this changes, since this is what you care about when you want to email 10K enrolled students that their course start date has changed, and the catalog portion of this is still optional.

I'm not sure what the state of things is now. I do remember in the past that publisher used to push values back into Studio for this kind of thing, which kind of turned into a mess, and is one of the reasons I beat on the drum of unidirectional data flow so often.

I realize I hand-waved a whole lot of stuff just now, and that it's probably not worth derailing this ticket with this broader conversation. If we have concrete answers to some of those questions I asked about when this event has to happen and what the listeners are, I'm guessing we can craft a reasonable signal without having to definitively settle some of these larger framing issues.

FYI to @feanil, @kdmccormick, @Carlos-Muniz: in case you were interested in this issue, and its sibling https://github.com/openedx/edx-platform/issues/30682. Also @felipemontoya, in case you hadn't seen this one.

rgraber commented 2 years ago

This signal is meant to replace the course_published signal that studio currently fires whenever anything about a course is changed. I was not actually clear on why it was originally named that since studio's concept of "published" seems to vary based on the modulestore implementation. Eventually it might be worth it to break this signal up to more granularly signal different kinds of changes but for this project I'd strongly prefer to do a wholesale replace.

rgraber commented 2 years ago

To explain further: I am not familiar enough with the flow in studio to confidently say we can write an event that only fires when a new course is created in studio, since I don't know all the ways this can happen. I am confident enough to know that whatever courses we give to course_discovery, whether or not they have changed in a way that is meaningful in that context or not, course_discovery will be able to take the parts it cares about and discard the rest.

rgraber commented 2 years ago

To continue the stream of consciousness: Ideally, the event would be "something happened that course_discovery cares about." However, I don't feel confident that I can create this event and fire it in the correct places, so I'd like to start with everything and let course_discovery decide what it cares about.

felipemontoya commented 2 years ago

Would it make sense in that case to initially define more than one event so that from the perspective of course_discovery (or any other consumer) there are options to choose from depending on what it is interested in? We could try to make it so that every time the broad signal is sent, always a second more granular signal is also sent.

I think that part of the reason why everything depends on the current COURSE_PUBLISHED signal, is that there where never other options to choose from and every listener defaulted to the same strategy. Listen for the signal that fires all the time and ignore every message that was not interesting.

rgraber commented 2 years ago

I'm not sure how we could determine which granular signal to send without significant involvement from the TNL team, who don't have the capacity to take this on right now.

robrap commented 2 years ago

@rgraber: Are we unable to define more clearly what discovery is interested in at this time? Or, do you fear that even if we defined that we wouldn’t know how to publish it? Or other? I think it would be helpful to define it if we can and see where we get blocked.

ormsbee commented 2 years ago

I don't think it's worth the effort to do a wholesale 1:1 replacement of our existing signal with an exact openedx-events counterpart because it would continue the anti-pattern we have of encouraging plugins to query edx-platform in unpredictable ways after receiving the signal.

If you're confident about what fields course-discovery wants (or some subset of them–like the ones you list in the description here), then we can start with just those fields as a more narrowly scoped signal, and add other data fields to it as necessary. In the worst case, we can emit that signal every place we already emit a course_published signal–it would make it unnecessary frequent, but we'd be confident that it is at least covering every possible change. Most of these fields should be on the root CourseBlock object.

I expect that we'll eventually want at least half a dozen different "something in the course has changed" signals that have overlapping fields for different use cases, but we definitely don't have to get into them now.

rgraber commented 2 years ago

My plan was indeed to fire the new signal with just the fields listed in the issue description. Those fields are taken directly from the request discovery makes to the lms, so I'm confident they represent the scope of what course_discovery needs to know about. In an ideal world, we would only fire the signal when either a new course run is created in Studio or one of the fields listed above changes. However, I'm not convinced I know all the places where all these fields might change. I am confident that all those places are already hooked up to course_published, which is why I would want to publish the new signal alongside course_published and have some signals that are effectively no-ops for discovery if the relevant fields haven't changed.

rgraber commented 2 years ago

To be more specific, discovery cares: When a new course run is added When a new course is added When enrollment/start/end dates change When the descriptors like short/long description, effort, pacing change When titles (of runs or of the whole course) change When the banner image/card image/video changes When a course or course run is deleted - this almost never happens though so I'm inclined to ignore it

ormsbee commented 2 years ago

@rgraber: That sounds great! In that case, does the following work?

robrap commented 2 years ago

FYI: whoever picks up this ticket could also review this draft new event guide to see if we are missing anything, or the guide is missing anything.

rgraber commented 2 years ago

Update: start_display and start_type are very strange fields (display can be either a timestamp or a string, and type tells you which) that are unused by discovery, so I'm going to leave them out.

timmc-edx commented 2 years ago

Should the architecture subdomain be authoring or course_authoring? The former is suggested in this thread, but the latter is used in the OEP.

kdmccormick commented 2 years ago

OEP-41 recommends content_authoring. Personally I think content_authoring or authoring are both A-OK.

course_authoring is just too specific, given that CMS is used to author libraries, in the future will likely support even more structure-agnostic content authoring.

rgraber commented 2 years ago

At the moment I'm going with content_authoring

robrap commented 2 years ago

On the subject of naming, I thought the term "catalog" has been mainly used with discovery. So, should COURSE_CATALOG_INFO_CHANGED actually be named COURSE_RUN_INFO_CHANGED?

rgraber commented 2 years ago

Hm. I do think there is value in distinguishing that the info that has changed is the type of thing the catalog cares about rather than a course block or section or something. Granted currently that's aspirational rather than accurate.

rgraber commented 2 years ago

BTW should you wish to follow along in the fun, the PR is being created here: https://github.com/openedx/openedx-events/pull/81 . it is not yet ready for review but can give you an idea of the current shape.

ormsbee commented 1 year ago

Update: start_display and start_type are very strange fields (display can be either a timestamp or a string, and type tells you which) that are unused by discovery, so I'm going to leave them out.

Yeah, so this was because we needed an actual date for the code to work (i.e. "don't let people see the course until XYZ"), but some of the course teams didn't want to commit to a specific calendar date in advance (and then miss that mark), so they wanted to be able to hand-wave it with something like "Spring 2013" or "Feb 2014", etc. I agree that it should be left out.

At the moment I'm going with content_authoring

👍

On the subject of naming, I thought the term "catalog" has been mainly used with discovery. So, should COURSE_CATALOG_INFO_CHANGED actually be named COURSE_RUN_INFO_CHANGED?

Hm. I do think there is value in distinguishing that the info that has changed is the type of thing the catalog cares about rather than a course block or section or something. Granted currently that's aspirational rather than accurate.

I agree with @rgraber. I think the combination of content_authoring for subdomain and COURSE_CATALOG_INFO_CHANGED for the event conveys the message that "this is a content change that specifically affects course data relevant for the catalog listing", which is great.

rgraber commented 1 year ago

FYI the PR is ready for review

timmc-edx commented 1 year ago

(Sorry, I meant to write content_authoring, not course!)

robrap commented 1 year ago

I'm fine with COURSE_CATALOG_INFO_CHANGED if you all think that is clear.

I simply wish I could align this decision and name choice with the names (or descriptions) in the bounded contexts list for Content Authoring, even if it meant updating the page. Can anyone answer what Bounded Context this is a part of?

Additionally, do we agree that "Authoring" is short for "Content Authoring", as far as subdomains are concerned? I updated the page to clarify that, so hopefully the answer is "yes".

timmc-edx commented 1 year ago

Can we remove the org and number fields from the new object, since they're just derived from the course key?

https://github.com/openedx/openedx-events/pull/81/files#diff-ce3d23f91ef18fd93f59fd1cf0a2c5f88b98dfde3f3790cb22281e74edfad042R44-R45

ormsbee commented 1 year ago

Can we remove the org and number fields from the new object, since they're just derived from the course key?

I'm fine with having fewer things to start and adding more later. The one use I could think of for having separate fields is if you had a non-Python client consuming this solely via the message bus, in which case they wouldn't have the opaque-keys library.

That being said, I'm a 🤷 on this either way.

rgraber commented 1 year ago

ADR for the actual event design as it ended up is here https://github.com/openedx/openedx-events/blob/main/docs/decisions/0009-course-catalog-info-changed-design.rst