mitodl / ol-data-platform

Pipeline definitions for managing data flows to power analytics at MIT Open Learning
BSD 3-Clause "New" or "Revised" License
36 stars 6 forks source link

Refactoring metadata parsing #1182

Closed quazi-h closed 2 months ago

quazi-h commented 2 months ago

What are the relevant tickets?

https://github.com/mitodl/hq/issues/4067

Description (What does it do?)

After setting up the new metadata stream _rawedxorgs3course_structure__coursemetadata in the edx.org Production Course Structure Airbyte connection, we were seeing Trino errors when trying to query the table in Starburst> `Glue table 'ol_warehouse_production_raw.rawedxorgs3course_structure__course_metadata' column 'slug' has invalid data type: null`. Rachel mentioned that most of these elements/attributes are not necessary for us right now, so we should just drop it if it's causing issues.

How can this be tested?

Once deployed, allow the metadata files to be reprocesses, and then try querying the tables in Starburst again. Confirm that there are no longer any issues/errors that pop up.