mitodl / ol-data-platform

Pipeline definitions for managing data flows to power analytics at MIT Open Learning
BSD 3-Clause "New" or "Revised" License
37 stars 6 forks source link

Staging and intermediate models for edx courses and enrollments #476

Closed rachellougee closed 1 year ago

rachellougee commented 1 year ago

User Story

Look into edx courses and enrollments data and create corresponding staging and intermediate models, like what we have done for MITxonline, xPro and bootcamp

Description/Context

edx course metadata

Source table

edx enrollments

Source table

rachellougee commented 1 year ago

A large number of enrollment data (874k as of today) in person_courses are missing user_id and most of the fields are blank, It looks like the same parsing issue is propagated into this table. That will need to be addressed from the source. We might need to filter these blanks until it's fixed @dseaton FYI

dseaton commented 1 year ago

Thanks for letting me know. I'll reflect this in the doc.

rachellougee commented 1 year ago

mitx_course and person_course are staged, also added enrollments and certificates to intermediate layers.

Note a couple of data issues here:

rachellougee commented 1 year ago

@dseaton I mentioned to you some courses in person_courses that don't exist in mitx_courses. There are actually only 6 of them, not a lot. Just FYI

MITx/CTL.CFx/1T18U5 MITx/CTL.CFx/1T18U8 MITx/Launch.x_3/1T2017 MITx/18.03Fx/2T2020 MITx/20.305x/1T2019 MITx/CTL.CFx/1T18U1

dseaton commented 1 year ago

Talked to Peter. The "*.CFx" courses are final exam courses, i.e., a course where a learner takes a final exam, but doesn't have other learner activity (example). We think this was an artifact before online proctoring. IMHO - I am ok that these courses aren't in mitx_courses. Whether we drop them or not from person course is something we should discuss.

I don't know why 18.03Fx (part of an x-series) and 20.305x are not in "mitx_courses". We may need to ask.

Launch.x_3 might be related to bootcamps in some way, but again, need to check why it isn't in the list.

rachellougee commented 1 year ago

To clarify these missing course runs, there are MITx/18.03Fx/1T2020 and MITx/20.305x/2T2019 run in mitx_course, but missing MITx/18.03Fx/2T2020 and MITx/20.305x/1T2019