semio / ddf_utils

Utilities for working with DDF datasets
https://open-numbers.github.io/
MIT License
3 stars 1 forks source link

improvements on readability of recipes #85

Open semio opened 6 years ago

semio commented 6 years ago

These days I spent a few hours implementing the syntax I suggested in #57, and then I figured out there are some thing we can improve on the yaml recipe.

When I was writing a long recipe, I felt it easy to lost the overview of the recipe. Even when I fold all levels in the recipe, I still can't get the overview of it:

2017-11-25 10 46 55

All I can see is just procedure names. To resolve this, we might rearrange the keyword order in recipe, to list the result id at first:

2017-11-25 10 48 53

But there is still problem. In a long recipe, such as procedures to generate population-age_group from population by age gender, for each group we need at least 3 procedures: filter target ages, groupby and aggregate, translate the indicator name. When there are 10+ age groups to do, we will see a long list of ingredient results, and not easy to see which ones are we interested in. So to improve this, I suggest adding one more level to list the ingredients we are interested in:

2017-11-25 10 52 29

In this way we can also remove the result for each procedure. Every procedure in each block will execute in chain.

Then I found this suggestion is very similar to https://github.com/semio/ddf_utils/issues/59#issue-228510129. Though the reason is not the same. It might be interesting to solve both issues together.

jheeffer commented 6 years ago

So in this the block-names are sort of chain-names, not concept names. They're identifiers for the result of the chain, right? Like in #59, these results can then be used to branch the chains or logically break the process into steps?

semio commented 6 years ago

yes, the block names are ingredient id for the output of last procedure in the block. Then we can use them in another block.