Closed hslinhares closed 7 months ago
closes #5 #3
@hslinhares pra nome dos recursos e arquivos o ideal pra gente é usar underscore (_
) e não hífen (-
). Você pode ver a explicação em https://github.com/transparencia-mg/issues-dadosmg-legado/issues/57.
Se o nome do arquivo tiver que ter hífen no gmail por causa de https://github.com/splor-mg/armazem-siafi-totais-dados/issues/2, precisamos alterar o nome dele na hora de salvar em data-raw.
Além disso estou encontrando um erro mas ainda não entendi o que está acontecendo. Na primeira vez que eu executo make all
eu recebo:
make all
Rscript scripts/extract.R
frictionless validate datapackage.yaml
────────────────────────────────────────────────────────────────────────── Dataset ───────────────────────────────────────────────────────────────────────────
dataset
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ name ┃ type ┃ path ┃ status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ totais-execucao-siafi │ table │ data-raw/totais-siafi-execucao-current.csv (multipart) │ VALID │
└───────────────────────┴───────┴────────────────────────────────────────────────────────┴────────┘
make: *** No rule to make target `data-raw/*.csv', needed by `data/totais-execucao-siafi.csv'. Stop.
Na segunda vez o make all
funciona. Se eu executo cada etapa de forma separada eu também não tenho erro. As possíveis explicações segundo o chatGPT:
Given that running each phony target (
extract
,validate
,transform
,build
,check
) separately doesn't produce any errors, but runningmake all
does, it's likely an issue related to the order in which dependencies are evaluated or created in the combined execution path. Here's a more detailed analysis:
Dependency Generation Timing: When you run the targets separately, each target's dependencies are generated and checked at the time of its execution. In the case of
transform
, this means thedata-raw/*.csv
files are already present (created byextract
) by the time you run it. However, in themake all
command, Make might be trying to resolve all dependencies before executing any commands. This leads to the error because thedata-raw/*.csv
files have not been created yet when it tries to resolve dependencies fortransform
.Make's Parallel Execution:
make
can sometimes evaluate dependencies in parallel, especially for unrelated targets. Ifmake
is trying to evaluate the dependencies fortransform
before or while runningextract
, it will not find the necessarydata-raw/*.csv
files, leading to the error.Intermediate Files and Order-only Prerequisites: The
data-raw/*.csv
files seem to be intermediate files that are generated by one step (extract
) and consumed by another (transform
). In such a case, it's often helpful to use order-only prerequisites in Makefiles. An order-only prerequisite ensures that a target is up to date before its dependencies are considered.You can modify the
transform
target to haveextract
as an order-only prerequisite:transform: | extract $(OUTPUT_FILES)
This tells
make
thatextract
must be run beforetransform
, but changes inextract
do not causetransform
to re-run unless its other prerequisites ($(OUTPUT_FILES)
) are also out of date.Reevaluation of Dynamic Variables: The variable
$(OUTPUT_FILES)
is dynamically generated. Make sure that this variable is correctly reevaluated when you runmake all
. Sometimes, the way Make evaluates dynamic variables can lead to timing issues in the resolution of dependencies.By adjusting the Makefile to ensure proper sequencing and handling of intermediate files, you should be able to resolve the error when running
make all
.
Aparentemente o mesmo erro aconteceu na execução do armazem-siafi-dados-2024
:
Rscript scripts/extract.R
frictionless validate datapackage.json
─────────────────────────────────── Dataset ────────────────────────────────────
dataset
┏━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ name ┃ type ┃ path ┃ status ┃
┡━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ execucao │ table │ data/execucao.csv.gz │ VALID │
│ credito │ table │ data/credito.csv.gz │ VALID │
│ cota │ table │ data/cota.csv.gz │ VALID │
│ receita │ table │ data/receita.csv.gz │ VALID │
└──────────┴───────┴──────────────────────┴────────┘
frictionless validate datapackage.yaml
─────────────────────────────────── Dataset ────────────────────────────────────
dataset
┏━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ name ┃ type ┃ path ┃ status ┃
┡━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ execucao │ table │ data-raw/execucao-siafi-01.csv (multipart) │ VALID │
│ credito │ table │ data-raw/credito-inicial-autorizado.csv │ VALID │
│ cota │ table │ data-raw/cota-item-data.csv │ VALID │
│ receita │ table │ data-raw/receita-arrecadada.csv │ VALID │
└──────────┴───────┴────────────────────────────────────────────┴────────┘
make: *** No rule to make target 'data-raw/*.csv', needed by 'data/execucao.csv.gz'. Stop.
Error: Process completed with exit code 2.
Engraçado que em um job anterior isso não aconteceu:
Rscript scripts/extract.R
frictionless validate datapackage.json
─────────────────────────────────── Dataset ────────────────────────────────────
dataset
┏━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ name ┃ type ┃ path ┃ status ┃
┡━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ execucao │ table │ data/execucao.csv.gz │ VALID │
│ credito │ table │ data/credito.csv.gz │ VALID │
│ cota │ table │ data/cota.csv.gz │ VALID │
│ receita │ table │ data/receita.csv.gz │ VALID │
└──────────┴───────┴──────────────────────┴────────┘
frictionless validate datapackage.yaml
─────────────────────────────────── Dataset ────────────────────────────────────
dataset
┏━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ name ┃ type ┃ path ┃ status ┃
┡━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ execucao │ table │ data-raw/execucao-siafi-01.csv (multipart) │ VALID │
│ credito │ table │ data-raw/credito-inicial-autorizado.csv │ VALID │
│ cota │ table │ data-raw/cota-item-data.csv │ VALID │
│ receita │ table │ data-raw/receita-arrecadada.csv │ VALID │
└──────────┴───────┴────────────────────────────────────────────┴────────┘
python main.py transform execucao
2023-12-07T15:32:30+0000 INFO [scripts.transform] Transforming resource execucao
python main.py transform credito
2023-12-07T15:33:59+0000 INFO [scripts.transform] Transforming resource credito
python main.py transform cota
2023-12-07T15:34:00+0000 INFO [scripts.transform] Transforming resource cota
python main.py transform receita
2023-12-07T15:34:02+0000 INFO [scripts.transform] Transforming resource receita
python main.py build
@hslinhares criei https://github.com/splor-mg/armazem-siafi-dados/issues/4 pra continuar a investigação mas como solução de contorno simplesmente removi data-raw/*.csv das dependencias
@hslinhares faça uma revisão nos nomes das colunas e depois vamos investigar as falhas no Github Actions.