openedx / openedx-aspects

Aspects - Analytics for Open edX
Apache License 2.0
6 stars 7 forks source link

De-dupe work on xAPI transforms #12

Closed jmakowski1123 closed 1 year ago

jmakowski1123 commented 1 year ago

We currently have 2 diverging implementations of xAPI transforms of tracking log events.

Can we work together to make one set of transforms that can be shared? Can we have the best of both worlds- anonymous actor ids from event-routing-backends, tracking log replay from Ralph?

Once we have a better understanding of what the source of truth for these transforms are we'll be in a better place to tackle this:

bmtcril commented 1 year ago

I'm hoping to get buy in from Data WG team members on this tomorrow. 🤞

bmtcril commented 1 year ago

Invested parties are getting together on 2023-01-17 to discuss the first round of questions and decide whether this should be an ongoing sub-group or of some lighter way of working together is better.

pomegranited commented 1 year ago

Awesome, thank you for raising this issue and scheduling this meeting @bmtcril ! Are there meeting notes or recording anywhere, or a group page on Confluence I can follow?

bmtcril commented 1 year ago

The notes and action items are here, though we didn't come to a lot of decisions yet. Zia and I are getting more familiar with the Pydantic implementation and will round up soon to chat about our findings.

bmtcril commented 1 year ago

I believe we've gone as far as we can at this point. Currently there is no way to do transforms outside the LMS environment that will result in a consistent actor id. This is because tracking log events are inconsistent with the user identifier (id, username etc) and never(?) include the anonymized identifier we wish to include by default. To do transforms outside of the LMS, the external system would need to make API calls on the LMS to transform the ids it gets from the tracking logs into something consistent. In the case of anonymized ids, it would defeat the purpose of anonymization to have other systems be able to transform them back and forth from anonymized to de-anonymized at will.

Similar issues exist with things like course names and video names, which are not present in the tracking log but which we would like to have in our xAPI statements.

So at least for now we will need to do both the live and log replay transformations in the LMS. This makes breaking out the transformations less valuable and likely something we will want to push off to v2. There is still a lot of value in making transformations plugable and able to be overridden, however, and there will be follow on tasks in event-routing-backends to take on architecting that work.