medic / cht-sync

Data synchronization between CouchDB and PostgreSQL for the purpose of analytics.
GNU General Public License v3.0
4 stars 5 forks source link

feat: run dbt in batches #158

Open njuguna-n opened 1 month ago

njuguna-n commented 1 month ago

Description

Add the ability to run dbt in batches to avoid scenarios where large table updates result in very large temporary tables that crash Postgres.

This PR depends on this corresponding PR in the CHT Pipeline repository.

156

Code review checklist

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

njuguna-n commented 1 month ago

@witash what do you think of this approach as a way of handling large initial syncs? Running small incremental batches seems to be working well from my local testing so far. I will clean up the PR and add some tests if you agree this is a good approach.

witash commented 1 month ago

@witash what do you think of this approach as a way of handling large initial syncs? Running small incremental batches seems to be working well from my local testing so far. I will clean up the PR and add some tests if you agree this is a good approach.

ok, yea we can try it, it will be interesting to see how well it does with large databases

njuguna-n commented 1 month ago

@dianabarsan yes that was my concern. I have not tested that yet but I will and add specific test case for that.

njuguna-n commented 1 month ago

@dianabarsan @witash please review. I have added a comment here summarizing the tests I did to this approach.

njuguna-n commented 1 month ago

@dianabarsan I addressed your comments. Please have another look.