tl-its-umich-edu / unizin-validation

Unizin Validation Scripts
1 stars 4 forks source link

Adding support for validating UDP Context Store (#15) #16

Closed ssciolla closed 4 years ago

ssciolla commented 4 years ago

This PR adds a query, test, and other small modifications to enable connection to the Unizin Data Platform's Context Store and perform record count validation of the tables we have initially identified as relevant to T&L development. Some additional details are provided below. The PR aims to resolve issue #15.

Specifically, I modified the existing env.json structure to have a parent DATA_SOURCES variable, which contains a dictionary of all data sources, and then made a small tweak to establish_db_connection to look for the parameters in the right place. I then added a new query that @zqian and I worked on together, following the existing udw_table_counts query and the crosswalk provided at Unizin's Canvas to UCDM page. I also added a test to ensure the YELLOW flag is being raised properly. Some small modifications were made under "Main Program" to include the new query and make the job name more global ("Unizin Daily Status Report").

jonespm commented 4 years ago

Since Student Explorer is not using UDP we should have validating against UDP currently be optional.

I think the queries to run (query_keys) should be defined somehow either by the json or by the data sources defined. If only UDW is defined only a certain set of queries will run, multiple data sources are defined more queries will be run.

We could also define which queries to be run in the configuration this may be the easiest.

ssciolla commented 4 years ago

@jonespm, I'll have a fix for the job configuration momentarily. Note however that the new check for the UDP query is "YELLOW" (i.e. it wouldn't throw a logging error).

ssciolla commented 4 years ago

Okay, @jonespm, see the latest commit. I know this isn't urgent, but I was already thinking about UDP so I thought I'd get this in. @zqian also raised the question about enabling different jobs. I added a command-line argument to validate.py where you can specify a job you want to run that is configured in jobs.py. If you don't provide a command-line argument, it defaults to the "UDW" job, or what was hardcoded before.