mitodl / ol-data-platform

Pipeline definitions for managing data flows to power analytics at MIT Open Learning
BSD 3-Clause "New" or "Revised" License
36 stars 6 forks source link

Update OCW external resources report #1200

Closed pt2302 closed 1 month ago

pt2302 commented 1 month ago

What are the relevant tickets?

Related to https://github.com/mitodl/hq/issues/4023.

Description (What does it do?)

This PR allows for the intermediate OCW resources table to properly pull in external resources, which are identified by the websitecontent_type = external-resource.

How can this be tested?

Run the following commands; the tests should all pass

dbt build --select staging.ocw  --vars 'schema_suffix: <your name>' --target dev_production
dbt build --select intermediate.ocw --vars 'schema_suffix: <your name>' --target dev_production

Run the following query in https://mitol.galaxy.starburst.io/query-editor to see the result of the above tables:

SELECT * FROM ol_data_lake_production.ol_warehouse_production_<your name>_intermediate.int__ocw__resources

Then, verify that external resources are properly pulled in, using a query like

select course_name, resource_uuid, studio_url, external_resource_url from ol_data_lake_production.ol_warehouse_production_<your name>_intermediate.int__ocw__resources
where external_resource_is_broken = true

Also, verify that it is possible to get a list of all external resources, using a query like

select * from ol_data_lake_production.ol_warehouse_production_<your name>_intermediate.int__ocw__resources
where content_type = 'external-resource'