richfitz / remake

Make-like declarative workflows in R
Other
340 stars 32 forks source link

Best practices for recursive builds #182

Open tcholewik opened 6 years ago

tcholewik commented 6 years ago

I'm working a project where I have to compile series of reports. While I intended to have one report for each day, and I need incremental intraday reports as well. To build intraday reports I can either execute queries that collect data from midnight untill now, or I could check what time did last report run and my query will pull just additinal data.

For now I can just query whole day, but second solution offers a puzzle, I wonder if it is possible to do in remake.

If I generated report_today.html and it used data_today.csv then to query just new data I would have a step that checks data_today.csv for most recent record timestamp and use that as an input for query what would say something like SELECT * FROM SOMETABLE WHERE TIME > {{MOST_RECENT_RECORD}}. At this point I can query the database as append results to data_today.csv.

What wories me is that in setup described above data_today.csv is both a dependecy for first step and a taget file for last step, so before as we finish running this workflow we already invalidated dependency of step 1.

So my questions are:

  1. Is remake prepared to handle this situation?
  2. Is there a way to decouple this target/dependency relationship?
  3. What are the best practices for handling this?
wlandau-lilly commented 6 years ago

What if you write to data_today.csv as a side effect instead of declaring it as a target?

tcholewik commented 6 years ago

I suppose thats one way to do it. That way I assume we'd always have to call data loading targat manually, since remake does not support phony targets yet.