openedx-unsupported / edx-analytics-pipeline

GNU Affero General Public License v3.0
91 stars 116 forks source link

Support DDL changes in MySQL->Snowflake loading #765

Closed doctoryes closed 4 years ago

doctoryes commented 4 years ago

Analytics Pipeline Pull Request

https://openedx.atlassian.net/browse/DE-1712

Change the loading of MySQL tables into Snowflake via Sqoop by loading the table data first temporarily into a "scratch" Snowflake schema and then swapped/moved over into its target schema. This change supports DDL changes in existing tables over time.

I've tested this change via an import of the credentials DB using the Jenkins debug-pipeline-job job. I'm currently performing more testing, importing each of the four MySQL DBs we store in Snowflake.

The scratch schemas were created by Terraform in this PR: https://github.com/edx/terraform/pull/1871

Make sure that the following steps are done before merging:

codecov[bot] commented 4 years ago

Codecov Report

Merging #765 into master will decrease coverage by 0.04%. The diff coverage is 30.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #765      +/-   ##
==========================================
- Coverage   74.24%   74.19%   -0.05%     
==========================================
  Files         209      209              
  Lines       23636    23660      +24     
==========================================
+ Hits        17548    17555       +7     
- Misses       6088     6105      +17
Impacted Files Coverage Δ
...asks/warehouse/load_vertica_schema_to_snowflake.py 46.15% <ø> (ø) :arrow_up:
...lytics/tasks/warehouse/load_warehouse_snowflake.py 47.71% <ø> (ø) :arrow_up:
...asks/warehouse/load_internal_reporting_database.py 55.59% <ø> (ø) :arrow_up:
edx/analytics/tasks/common/snowflake_load.py 31.63% <30.3%> (-0.35%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 140fb53...86f66fc. Read the comment docs.

doctoryes commented 4 years ago

Successful imports so far: http://jenkins.analytics.edx.org/job/debug-pipeline-job/312/ (credentials) http://jenkins.analytics.edx.org/job/debug-pipeline-job/314/ (discovery)

doctoryes commented 4 years ago

Currently, no tests exist for Snowflake loading - hence the coverage failure. I'll work on adding a few in a separate commit.

doctoryes commented 4 years ago

Successful ecommerce import here: http://jenkins.analytics.edx.org/job/debug-pipeline-job/315/