mitodl / ol-data-platform

Pipeline definitions for managing data flows to power analytics at MIT Open Learning
BSD 3-Clause "New" or "Revised" License
36 stars 6 forks source link

Update metadata fields parsed to match int__edxorg__mitx_courseruns #1187

Closed quazi-h closed 2 months ago

quazi-h commented 2 months ago

What are the relevant tickets?

https://github.com/mitodl/hq/issues/4067

Description (What does it do?)

After setting up the new metadata stream _rawedxorgs3course_structure__coursemetadata in the edx.org Production Course Structure Airbyte connection, we were seeing Trino errors when trying to query the table in Starburst> `Glue table 'ol_warehouse_production_raw.rawedxorgs3course_structurecourse_metadata' column 'allow_anonymous' has invalid data type: null`. Rachel mentioned that most of these elements/attributes are not necessary for us right now, so we should just drop it if it's causing issues. I've removed all of the extraneous fields and whittled the selection down to the courserun fields that are being processed in the [intedxorgmitx_courseruns dbt model](https://github.com/mitodl/ol-data-platform/blob/main/src/ol_dbt/models/intermediate/edxorg/intedxorg__mitx_courseruns.sql#L24-L35) as recommended by Rachel (with the exception of the courserun_url).

How can this be tested?

Once deployed, allow the metadata files to be reprocesses, and then try querying the tables in Starburst again. Confirm that there are no longer any issues/errors that pop up.