ministryofjustice / analytical-platform

Analytical Platform • This repository is defined and managed in Terraform
https://docs.analytical-platform.service.justice.gov.uk
MIT License
8 stars 4 forks source link

:chart_with_upwards_trend: Update Create-A-Derived-Table to newest DBT-Core/DBT-Athena Versions #3290

Open jhpyke opened 5 months ago

jhpyke commented 5 months ago

User Story

As a… user of Create-A-Derived-Table I expect to… be able to use the latest features enabled by DBT. So that… I can be empowered to do cutting edge analysis within the organisation.

Value / Purpose

We are currently using

dbt-core==1.6.5
dbt-athena-community==1.6.2

as our locked package versions. As of writing this ticket, DBT-Athena is available at v1.8.2, and dbt-core is at v1.8.3. We should bring our packages up to date, to ensure users can use the latest features of DBT and to ensure we do not develop tech debt.

Please note that although dbt-athena-community enabled Python Models (I.E. Athena with PySpark) as functionality for DBT, we will NOT be including this functionality in the scope of this work. This will require further investigation, including understanding of cost implications of supporting this functionality.

Useful Contacts

@jhpyke

User Types

No response

Hypothesis

No response

Proposal

  1. Bump requirements.txt to latest versions
  2. Test deployments against the sandpit environment. Work with the #data-modelling team to identify some models to test deploy in sandpit.
  3. Create a branch for users to do acceptance testing with the updated requirements. This branch should NOT contain any file changes from main other than the changes to requirements.
  4. Put comms in #ask-data-modelling and #data-and-analytics-engineering channels to encourage users to test their in progress and existing prod models using the branch. See previous comms for examples of structure.
  5. If users identify issues with updated packages, work with them to proactively understand (and where appropriate intervene to fix) to ensure that the upgrade can occur smoothly.
  6. After an acceptance period (no more than 7 days unless issues are identified), merge the acceptance testing branch into main.

Additional Information

No response

Definition of Done

murad-ali-MoJ commented 5 months ago

Progression:

murad-ali-MoJ commented 5 months ago

Update: Posted about the new version release to the data-modeling channel and asked the modeler to provide some example databases for testing.

murad-ali-MoJ commented 5 months ago
murad-ali-MoJ commented 5 months ago

update the user guidance for testing. https://github.com/moj-analytical-services/user-guidance/actions/runs/7985351178

murad-ali-MoJ commented 4 months ago
murad-ali-MoJ commented 4 months ago

Some of the tests fail here. I don't know why they are failing. I posted the problem analytical-platform channel. If anyone can help with it.

AntFMoJ commented 3 months ago

Results from Tamsin following collaboration:

Outcome successfully deployed general domain and all seeds to sandpit deployed and tested dervied_delius_stg and derived_delius_dim

Next steps build probation snapshots derived_delius itself

Error with snapshots more information/input from the derived_delius team; Ben W and Chris S

Suggested to team liaise closely with derived_delius team get them tested their stuff on the dbt_athena_update branch

Longer term liaise with other key stakeholder / model owners; opg etc open up testing to users for them to test their own models on the dbt_athena_update branch, using the new version of dbt_core and dbt_athena_community

Things to consider life cycle rules on sandpit; make these longer during testing phase

AntFMoJ commented 3 months ago

Arranged a call with probation modelling contact to discuss issues with creating snapshots for derived-delius.

Working through building daily deployment models in sandpit, currently on OPG.

AntFMoJ commented 3 months ago

All daily deployment models have been built in sandpit. Put another message out to data-modelling channel to ask for user feedback on the new dbt versions.

dbt-core v.1.7.10 needs to be used, as v1.7.7 has an issue with incompatibility with the latest version of protobuf.

AntFMoJ commented 3 months ago

Follow up call for feedback on new versions has been put out to the data-modelling channel. The new versions have not been put into production.

Any issues raised through feedback will be raised as support tickets. This ticket will be taken out of sprint for now, although new versions have not yet been put in production.

jhpyke commented 10 hours ago

Currently tested via recent Sandpit deploy - Awaiting final user acceptance by #data-modelling team to sign-off prod deployment.