Closed jmakowski1123 closed 8 months ago
@Ian2012 this is as far as I've gotten today on this work, still need to write the other 2 tickets
I have a few comments regarding Create a single “tracking log” OpenEdxPublicSignal:
events
-- please let me know if I'm wrong. I imagine our data class would look something like this:@attr.s(frozen=True)
class ContextTrackingLogBase:
course_id = attr.ib(type=str)
org_id = attr.ib(type=str)
path = attr.ib(type=str)
user_id = attr.ib(type=str)
Class ContextTrackingLog(ContextTrackingLogBase):
course_user_tags = attr.ib(type=Optional[dict])
module = attr.ib(type=Optional[dict])
@attr.s(frozen=True)
class TrackingLogData:
accept_language = attr.ib(type=str)
agent = attr.ib(type=str)
context = attr.ib(type=ContextTrackingLog)
event = attr.ib(type=dict)
event_source = attr.ib(type=str)
event_type = attr.ib(type=str)
host = attr.ib(type=str)
ip = attr.ib(type=str)
name = attr.ib(type=str)
page = attr.ib(type=str)
referer = attr.ib(type=str)
session = attr.ib(type=str)
time = attr.ib(type=str)
username = attr.ib(type=str)
Now, the event
field varies according to the event we're sending. So, we could only validate the tracking log common data using the data class we just created (which happens automatically when sending the event). But we couldn't do much with the info inside the event
field -- at least as it is right now.
learning
as part of their event_type
:org.openedx.learning.course.enrollment.created.v1
Each event_type
is created according to this ADR. Now, how would this look like with a single tracking log event? 🤔 I see in the documentation of the events that some events are categorized as student
or course
. I don't know if this could be used to create more than one event so we can continue following the naming convention.
event_type
, which differs from the Open edX Events event_type
mentioned before. I hope the tracking logs event_type
does not confuse the user. After reviewing architectural subdomains again, specifically after seeing this image:
Found here. I see that there's a bounded context called Analytics
, which is traversal to all subdomains, now I wonder if we should create a tracking log Open edX event for each subdomain. 🤔
We might need to ask an expert 😉. What do you think @ormsbee?
I think this is all reasonable. Sadly a lot of the time various fields on the tracking log entries will be empty strings, and if you haven't I think it would make sense to confirm these are the only keys we're going to see. There exists some possibility that people will want to replay very old tracking logs which may have different or broken formats. You all are probably in a better position than I am to check that, though. :)
For the arch domain I personally think a single cross-cutting Analytics subdomain makes the most sense as shown in the image above and this one:
But it's a loosely held opinion and I could be convinced to go a different direction.
Proper PRs has been created for this requirement:
In the proposed approach, the data format was updated to match the schema defined here because data and context are dynamic variables I think is not important to define a rigid schema for the tracking log data:
event = {
'name': name or UNKNOWN_EVENT_TYPE,
'timestamp': datetime.now(UTC),
'data': data or {},
'context': self.resolve_context()
}
We had a reminder from @robrap that the volume of these events may require us to look at making some of the event bus audit logging configurable to avoid performance issues related to log spam. As part of testing these PRs we should make sure that the associated logging is of an acceptable volume.
The current state of this work is the following:
This can be tested by defining the following settings in your environment:
EVENT_TRACKING_BACKENDS = {
'event_bus': {
'ENGINE': 'eventtracking.backends.event_bus.EventBusRoutingBackend',
},
}
EVENT_BUS_PRODUCER = 'edx_event_bus_redis.create_producer'
EVENT_BUS_REDIS_CONNECTION_URL = 'redis://@redis:6379/'
EVENT_BUS_TOPIC_PREFIX = 'dev'
EVENT_BUS_CONSUMER = 'edx_event_bus_redis.RedisEventConsumer'
@robrap can you take a look at the changes here: https://github.com/openedx/event-tracking/pull/246
The workaround is basically to avoid an inifite loop there by re-emitting the signals. ERB can't be converted in an IDA as ERB depends on accesing the same database as the LMS, so it's better to consume the events in the same service.
External services still can access to the event bus data by using TRACKING_EVENT_EMITTED, but TRACKING_EVENT_EMITTED_TO_BUS is used here internally.
Also, this can be easily run as:
tutor dev exec lms ./manage.py lms consume_events -t analytics -g event_routing_backends --extra '{"consumer_name": "aspects"}'
cc @felipemontoya @mariajgrimaldi @bmtcril
I have not reviewed any code, but simply providing https://github.com/openedx/openedx-events/issues/79 with some past discussion around avoiding infinite loops.
The infinite loop problem has been addressed here: https://github.com/openedx/openedx-events/pull/312 While there is still an open PR for batching events, I think the work completed is sufficient to close this out.
To extend our event handling capabilities we would like to be able to put tracking logs onto the event bus. This epic is to track that work, and the first consumer of those logs - event-routing-backends optionally using it instead of the Celery-based async backend.
Existing task that should be completed as a prerequisite to this work: https://github.com/openedx/openedx-events/issues/210
High level tasks to be groomed: