Open idiom-bytes opened 1 month ago
To implement this ticket, we should first start w/ simply updating predictions when truevals and payouts show up
[How this ticket grows]
In the future...
subscription event -> new subscription record
slot event -> new slot record
prediction event -> new bronze prediction record -> update bronze slot record
trueval event -> new trueval record -> update N bronze prediction records -> update 1 bronze slot record
payout event -> new payout record -> update N bronze prediction records -> update 1 bronze slot record
Here is one of my ways...
Although this takes a couple of extra steps, the overall amount of rows scanned/computed/joined, is far lower... Increasing the overall performance of the workflow.
Most of this work should look like SQL queries and a swap logic update at the end of the ETL update logic.
Note that in the end, we should expect a smaller number of payouts relative to predictions made, and a lot of bronze_predictions with null payouts. But, 100% of all payouts should be registered in the bronze_predictions table.
[Feedback Mustafa] With reference to the code/design provided, as I explained to Mustafa after reviewing his proposal.
[Effective Processing of Events] I have instead, done a pseudo-implementation of the SQL queries + logic required to get this working.
[Simplify Requirements even Further] I have also emphasized how much simpler all of this can be to deliver on the goal of: predictoor revenue dashboard by not requiring the trueval table.
Trueval does not contain the user id anywhere, so it cannot update the prediction table directly. First, it will need to update a slot, and we're not caring about that at the moment.
Literally, all we need is to join payouts with predictions. The rest can come later.
Motivation
We have now verified that the basic lake functionality is working as expected.
We now want to verify the data quality and completeness.
This means that additional SQL queries are being run, such that more tables are being processed and richer data is being generated.
Update Step - Incrementally updating the Lake
When you run the "lake update" command, later SQL queries are responsible for updating w/ the most recent information.
Data Workflows All data workflows should operate in the same way.
DoD:
Task: