We need to put Lake/ETL on a thread. This should run all the time, compile the data, and have it there ready to be served.
To address these requirements, we'll have to support updating GQLDF + ETL, such that st_ts & end_ts can be "natural language" dates... such as "1d ago" and "now.
@enforce_types
def do_lake_etl_update(_, ppss):
"""
@description
This runs all dependencies to build analytics
All raw, clean, and aggregate data will be generated
1. All subgraph data will be fetched
2. All analytic data will be built
3. Lake contains all required data
4. Dashboards read from lake
Please use nested_args to control lake_ss
ie: st_timestr, fin_timestr, lake_dir
"""
# pseudo code, spawn a thread, i dont care...
while true:
# Move that code here....
st_ts_ms = UnixTimeMs.from_timestr(ppss.lake_ss.st_timestr) # "1 day ago"
fin_ts_ms = UnixTimeMs.from_timestr(ppss.lake_ss.fin_timestr) # "now"
# pass fixed time through pipeline
gql_data_factory = GQLDataFactory(ppss)
etl = ETL(ppss, gql_data_factory)
etl.do_etl(st_ts_ms, fin_ts_ms)
Todo:
[ ] Lake/ETL can now sit on a looping thread updating, keeping the data updated in the background, and it being built/served
[ ] Lake app is updating live by consuming the output/updated records from this update
Motivation
We need to put Lake/ETL on a thread. This should run all the time, compile the data, and have it there ready to be served.
To address these requirements, we'll have to support updating GQLDF + ETL, such that st_ts & end_ts can be "natural language" dates... such as "1d ago" and "now.
Now use the implementation from the discussion... https://github.com/oceanprotocol/pdr-backend/pull/1095#discussion_r1617991978
Here is what that fix looks like
Todo: