osdldbt / dbt5

Database Test 5: Fair Use TPC Benchmark(TM) E
Artistic License 2.0
1 stars 8 forks source link

custom postgresql loader is not generating the correct number of rows #12

Closed markwkm closed 1 month ago

markwkm commented 4 months ago

At least as recent as the update for egen v1.14.0, generating flat files and loading produces the expected number of rows, so this suggests there something wrong with the custom code path in EGenLoader.

The pgsql-check-db script suggests only the growing tables (expect trade_request) have issues:

GROWING TABLES
==============

cash_transaction 79488000 ~ 15897761
holding 4406400 ~ 888468
holding_history 115776000 ~ 23156714
holding_summary 248900 ~ 49786
settlement 86400000 ~ 17280000
trade 8640000 ~ 17280000
trade_history 207360000 ~ 41472674
trade_request 0 = 0
markwkm commented 3 months ago

I figured this out, a patch is baking.

We thought we could be clever and use a TRUNCATE/FREEZE trick when loading data directly. It turns out that EGenLoader is loading batches of data at a time. So a new batch will TRUNCATE the previous batch of data. The current solution is to not use the TRUNCATE/FREEZE trick at all.

This appears to only happen with the "growing tables", at least at the default load parameters.

Will link to patch once it's pushed.

markwkm commented 1 month ago

Fixed in e0264458f544abd19ca74ae9c59c6443f21563d6.