openedx / openedx-events

Open edX events from the Hooks Extensions Framework
Apache License 2.0
11 stars 21 forks source link

[CloudEvents] Field ce_time for event bus events #159

Closed robrap closed 1 year ago

robrap commented 1 year ago

This is a continuation of https://github.com/openedx/openedx-events/issues/77, where we decided to defer the CloudEvent field ce_time, as defined in OEP-41.

robrap commented 1 year ago

There are a variety of times related to event production:

For the purposes of the rest of this discussion, we will ignore any times related to consuming the event. Any consumer can call datetime.now() when required, and at this time, we have no need to provide this as an attribute on the event.

According to the OEP, we must implement a "time" field that could be either db_created_at_time or signal_produced_time. Note: I will ignore the other current time possibilities. As a consumer, possibly the only way to know which of these you are getting for a certain event would be to read the docs for the specific event.

As the OEP states, using the created_at time is ideal for the consumer for a variety of reasons. However, it is more challenging to implement. Note: maybe this is no big deal for legacy event and I just need to better understand?

My main concern is holding up the ability to pass along any of these times, being blocked on how long it would take to implement the db_created_at_time solution, and knowing that once a ce_time header is added, it should not be changed without causing a major version change. Additionally, this is not a one time cost for our single event. This will come up for any new events (and especially for legacy event?). I'd like to consider sending a signal_produced_time header that will always mean the same thing, is simple to implement, and may or may not differ from ce_time. This could enable us to defer implementing ce_time, if we continue to be stuck on this.

Separately, confluent-kafka offers a tuple for its timestamp that provides both the time and a type of the time. We could consider enhancing the ce_time header with that information in a separate custom header. See https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.Message.timestamp. This is not a priority. I just thought it was interesting.

robrap commented 1 year ago

As an example of implementing the production timestamp, here is the code where we get the course for the event bus event data: https://github.com/openedx/edx-platform/blob/0b68c84286fe53c32b6b8c1e01ce30845bb8496c/cms/djangoapps/contentstore/signals/handlers.py#L85. I don't see how to get the db dates for that course. Maybe others know? Or, do we think making that data available (here and in all cases) should be a blocker to sending any new event?

I'm not certain of the answer, which is why I am open to punting on this and instead using a specifically named and easily implemented field that comes from datetime.now(). It could be documented as not necessarily matching the DB time of the related data.

robrap commented 1 year ago

UPDATE: @ormsbee and I discussed and he provided me with a solution for the DB time of this event: course.subtree_edited_on. So, we will update to pass through this event time. Related, OpenEdxPublicSignal.send_event() and new send_event_to_bus will need an optional time argument to use as the metadata.