splor-mg / spreadmart

Data mart com dados orçamentários
0 stars 0 forks source link

Análise do make como ferramenta para viabilizar processamento incremental e retomada da execução depois de falhas #3

Open fjuniorr opened 1 year ago

fjuniorr commented 1 year ago

Tasks with side-effects without stdout

Make works by comparing the timestamps of dependent files. If the input file is newer than the output file, make executes the specified command to update the output file. This way, make only processes the data that has changed, achieving incremental processing.

If a command fails or is interrupted during execution, make will not mark the output file as up-to-date. The next time you run make, it will resume the execution from the point of failure or interruption, as long as the dependencies have not changed.

When your output corresponds to a table being updated in an RDBMS like SQLite, you can still use make to orchestrate the pipeline. In this case, you'll need to create intermediate files that act as "markers" to track the progress of the pipeline. These marker files can help make determine whether a specific input has been processed and updated in the database.

-- chatGPT

File modification date on git checkout

Git does not preserve modification times on checkout, and this can be an issue when using make for incremental processing in a fresh environment like a Docker container. Since make relies on file timestamps to determine if a target is up-to-date, this can lead to unnecessary reprocessing of unchanged files when building in a fresh environment.

One workaround for this issue is to use content-based hashes to determine if a file has changed instead of relying on timestamps.

-- chatGPT