richardogoma / bitcoin-rate-etl

An ETL pipeline to ingest near-real time data of Bitcoin rates across major currencies (USD/GBP/EUR) from the CoinDesk Bitcoin Price Index API.
MIT License
0 stars 1 forks source link

feat: Resolved discontinuous data loading #18

Closed richardogoma closed 1 year ago

richardogoma commented 1 year ago

See POC below, and observe how the trigger time auto-corrects and persists. The microseconds discontinuity is attributed to a delay in the extraction and transformation phase. Further effort would be made to see how to deduct the time it takes to extract and load the data from the waiting time.

nohup: ignoring input
2023-06-17 04:51:01.022871: INFO: Inserted Bitcoin rates as at 2023-06-17T04:50:00+00:00 into the database ...
2023-06-17 04:52:00.312049: INFO: Inserted Bitcoin rates as at 2023-06-17T04:51:00+00:00 into the database ...
2023-06-17 04:53:00.145823: INFO: Inserted Bitcoin rates as at 2023-06-17T04:52:00+00:00 into the database ...
2023-06-17 04:54:00.340358: INFO: Inserted Bitcoin rates as at 2023-06-17T04:53:00+00:00 into the database ...
2023-06-17 04:55:00.143177: INFO: Inserted Bitcoin rates as at 2023-06-17T04:54:00+00:00 into the database ...
2023-06-17 04:56:00.246284: INFO: Inserted Bitcoin rates as at 2023-06-17T04:55:00+00:00 into the database ...
2023-06-17 04:57:00.144123: INFO: Inserted Bitcoin rates as at 2023-06-17T04:56:00+00:00 into the database ...
2023-06-17 04:58:00.145999: INFO: Inserted Bitcoin rates as at 2023-06-17T04:57:00+00:00 into the database ...
2023-06-17 04:59:00.146631: INFO: Inserted Bitcoin rates as at 2023-06-17T04:58:00+00:00 into the database ...

Closes #17

richardogoma commented 1 year ago

RE: Further effort would be made to see how to deduct the time it takes to extract and load the data from the waiting time. The average ETL process duration is approximately 0.0526 seconds, and the median ETL process duration is approximately 0.0529 seconds.

Further, the previous run doesn't determine the duration of the next run, so the deduction is illogical, especially considering that,

  1. The run is dependent on the performance of the CoinDesk and SQLite APIs,
  2. The typical process duration of 0.0529 secs indicates a highly performant process, moreso, the run duration is so small to fit well into a minute,
  3. The calculation for the waiting or sleeping time starts right after the process is completed, so the time taken by the ETL process from the one minute allotted for each run (we wait until the next minute) is already factored in.

Given the small duration of the process and the fact that the waiting or sleeping time starts right after the process is completed, it seems reasonable to conclude that the waiting time already accounts for the time taken by the ETL process. Therefore, there may not be a need to deduct the extraction and loading time from the waiting time in this scenario.