splor-mg / dados-totais-armazem-siafi

0 stars 0 forks source link

Implementa make-transform e make-build #4

Closed hslinhares closed 7 months ago

fjuniorr commented 8 months ago

@hslinhares faça uma revisão nos nomes das colunas e depois vamos investigar as falhas no Github Actions.

hslinhares commented 7 months ago

closes #5 #3

fjuniorr commented 7 months ago

@hslinhares pra nome dos recursos e arquivos o ideal pra gente é usar underscore (_) e não hífen (-). Você pode ver a explicação em https://github.com/transparencia-mg/issues-dadosmg-legado/issues/57.

Se o nome do arquivo tiver que ter hífen no gmail por causa de https://github.com/splor-mg/armazem-siafi-totais-dados/issues/2, precisamos alterar o nome dele na hora de salvar em data-raw.

Além disso estou encontrando um erro mas ainda não entendi o que está acontecendo. Na primeira vez que eu executo make all eu recebo:

make all
Rscript scripts/extract.R 
frictionless validate datapackage.yaml
────────────────────────────────────────────────────────────────────────── Dataset ───────────────────────────────────────────────────────────────────────────
                                              dataset                                              
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ name                  ┃ type  ┃ path                                                   ┃ status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ totais-execucao-siafi │ table │ data-raw/totais-siafi-execucao-current.csv (multipart) │ VALID  │
└───────────────────────┴───────┴────────────────────────────────────────────────────────┴────────┘
make: *** No rule to make target `data-raw/*.csv', needed by `data/totais-execucao-siafi.csv'.  Stop.

Na segunda vez o make all funciona. Se eu executo cada etapa de forma separada eu também não tenho erro. As possíveis explicações segundo o chatGPT:

Given that running each phony target (extract, validate, transform, build, check) separately doesn't produce any errors, but running make all does, it's likely an issue related to the order in which dependencies are evaluated or created in the combined execution path. Here's a more detailed analysis:

  1. Dependency Generation Timing: When you run the targets separately, each target's dependencies are generated and checked at the time of its execution. In the case of transform, this means the data-raw/*.csv files are already present (created by extract) by the time you run it. However, in the make all command, Make might be trying to resolve all dependencies before executing any commands. This leads to the error because the data-raw/*.csv files have not been created yet when it tries to resolve dependencies for transform.

  2. Make's Parallel Execution: make can sometimes evaluate dependencies in parallel, especially for unrelated targets. If make is trying to evaluate the dependencies for transform before or while running extract, it will not find the necessary data-raw/*.csv files, leading to the error.

  3. Intermediate Files and Order-only Prerequisites: The data-raw/*.csv files seem to be intermediate files that are generated by one step (extract) and consumed by another (transform). In such a case, it's often helpful to use order-only prerequisites in Makefiles. An order-only prerequisite ensures that a target is up to date before its dependencies are considered.

    You can modify the transform target to have extract as an order-only prerequisite:

    transform: | extract $(OUTPUT_FILES)

    This tells make that extract must be run before transform, but changes in extract do not cause transform to re-run unless its other prerequisites ($(OUTPUT_FILES)) are also out of date.

  4. Reevaluation of Dynamic Variables: The variable $(OUTPUT_FILES) is dynamically generated. Make sure that this variable is correctly reevaluated when you run make all. Sometimes, the way Make evaluates dynamic variables can lead to timing issues in the resolution of dependencies.

By adjusting the Makefile to ensure proper sequencing and handling of intermediate files, you should be able to resolve the error when running make all.

fjuniorr commented 7 months ago

Aparentemente o mesmo erro aconteceu na execução do armazem-siafi-dados-2024:

Rscript scripts/extract.R 
frictionless validate datapackage.json
─────────────────────────────────── Dataset ────────────────────────────────────
                      dataset                       
┏━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ name     ┃ type  ┃ path                 ┃ status ┃
┡━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ execucao │ table │ data/execucao.csv.gz │ VALID  │
│ credito  │ table │ data/credito.csv.gz  │ VALID  │
│ cota     │ table │ data/cota.csv.gz     │ VALID  │
│ receita  │ table │ data/receita.csv.gz  │ VALID  │
└──────────┴───────┴──────────────────────┴────────┘
frictionless validate datapackage.yaml
─────────────────────────────────── Dataset ────────────────────────────────────
                                 dataset                                  
┏━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ name     ┃ type  ┃ path                                       ┃ status ┃
┡━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ execucao │ table │ data-raw/execucao-siafi-01.csv (multipart) │ VALID  │
│ credito  │ table │ data-raw/credito-inicial-autorizado.csv    │ VALID  │
│ cota     │ table │ data-raw/cota-item-data.csv                │ VALID  │
│ receita  │ table │ data-raw/receita-arrecadada.csv            │ VALID  │
└──────────┴───────┴────────────────────────────────────────────┴────────┘
make: *** No rule to make target 'data-raw/*.csv', needed by 'data/execucao.csv.gz'.  Stop.
Error: Process completed with exit code 2.

Engraçado que em um job anterior isso não aconteceu:

Rscript scripts/extract.R 
frictionless validate datapackage.json
─────────────────────────────────── Dataset ────────────────────────────────────
                      dataset                       
┏━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ name     ┃ type  ┃ path                 ┃ status ┃
┡━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ execucao │ table │ data/execucao.csv.gz │ VALID  │
│ credito  │ table │ data/credito.csv.gz  │ VALID  │
│ cota     │ table │ data/cota.csv.gz     │ VALID  │
│ receita  │ table │ data/receita.csv.gz  │ VALID  │
└──────────┴───────┴──────────────────────┴────────┘
frictionless validate datapackage.yaml
─────────────────────────────────── Dataset ────────────────────────────────────
                                 dataset                                  
┏━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ name     ┃ type  ┃ path                                       ┃ status ┃
┡━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ execucao │ table │ data-raw/execucao-siafi-01.csv (multipart) │ VALID  │
│ credito  │ table │ data-raw/credito-inicial-autorizado.csv    │ VALID  │
│ cota     │ table │ data-raw/cota-item-data.csv                │ VALID  │
│ receita  │ table │ data-raw/receita-arrecadada.csv            │ VALID  │
└──────────┴───────┴────────────────────────────────────────────┴────────┘
python main.py transform execucao
2023-12-07T15:32:30+0000 INFO  [scripts.transform] Transforming resource execucao
python main.py transform credito
2023-12-07T15:33:59+0000 INFO  [scripts.transform] Transforming resource credito
python main.py transform cota
2023-12-07T15:34:00+0000 INFO  [scripts.transform] Transforming resource cota
python main.py transform receita
2023-12-07T15:34:02+0000 INFO  [scripts.transform] Transforming resource receita
python main.py build
fjuniorr commented 7 months ago

@hslinhares criei https://github.com/splor-mg/armazem-siafi-dados/issues/4 pra continuar a investigação mas como solução de contorno simplesmente removi data-raw/*.csv das dependencias