mitodl / ol-data-platform

Pipeline definitions for managing data flows to power analytics at MIT Open Learning
BSD 3-Clause "New" or "Revised" License
37 stars 6 forks source link

int__micromasters__course_enrollments contains duplicates for DEDP prior to 2021 #670

Closed rachellougee closed 1 year ago

rachellougee commented 1 year ago

int__micromasters__course_enrollments contains duplicate enrollments for DEDP prior to 2021. As we migrated these DEDP enrollments from MicroMasters where courseware_backend = 'edxorg' to MITx Online last year, these overlap with DEDP enrollments from edxorg

There are around 78 runs that were double counted, for example 'course-v1:MITx+JPAL102x+3T2020' is the same run as 'MITx/JPAL102x/3T2020, this course was running on edxorg.

To fix this, we should add a filter to __micromasters_course_enrollments_from_mitxonline where platform = 'mitxonline', so it doesn't count these DEDP enrollments twice for the same user (email). Since this is currently used in the MM summary report, we should check for any side effect

rachellougee commented 1 year ago

This is fixed as part of https://github.com/mitodl/ol-data-platform/pull/673