slevin75 / ds-pipelines-targets-3

https://lab.github.com/USGS-R/many-task-pipelines-using-targets
0 stars 0 forks source link

Appliers #8

Closed github-learning-lab[bot] closed 3 years ago

github-learning-lab[bot] commented 3 years ago

Your pipeline is looking pretty good! Now it's time to add complexity. I've just added these two files to the repository:

In this issue you'll add these functions to the branching code in the form of two new steps.

Background

The goal of this issue is to expose you to multi-step branching. They're not hugely different from the single-step branching we've already implemented, but there are a few details that you haven't seen yet. The changes you'll make for this issue will also set us up to touch on some miscellaneous pipeline topics. Briefly, we'll cover:

Reminder about dynamic vs static branching

Remember that both dynamic and static branching can have multiple steps or appliers, but they are defined differently. We will focus on static branching for now, but remember that you can always reference the information for dynamic and static branching in their respective locations in the targets user guide (dynamic branching documentation; static branching documentation).

Adding another step in static branching

As you already know, static branching is set up using the tar_map() function, where task targets are defined by the argument values as either a list or data.frame and steps are defined by a call to tar_target() as an additional argument. Up until now, your static branching code in _targets.R, looks something like

tar_map(
  values = tibble(state_abb = states),
  tar_target(nwis_inventory, filter(oldest_active_sites, state_cd == state_abb)),
  tar_target(nwis_data, get_site_data(nwis_inventory, state_abb, parameter))
  # Insert step for tallying data here
  # Insert step for plotting data here
)

We actually already have more than one step in our branching setup - nwis_inventory and nwis_data. This shows that you can include additional calls to tar_target() to add more appliers to your branches. If you want to use a previous step's output, just use the target name from that step and targets will appropriately pass only the output relevant to each task target between the steps within tar_map(). We are going to add a few more steps to our static branching and there is already a hint for where we will add these ... ahem #Insert step for tallying data here and #Insert step for plotting data here ahem.

Steps that require additional info per task

So far, we have used functions in static branching that only needed our state abbreviation, e.g. "WI" or "IL". What happens when we want to have other information used per task? For example, we need to save files per task and we want those to be passed into our step function. Easy! We can just edit the information we pass in for values. Currently, we are using a single-column tibble but that can easily have multiple columns and those columns can be used as arguments to tar_target() commands within tar_map(). We will try this out next!

github-learning-lab[bot] commented 3 years ago

:keyboard: Activity: Switch to a new branch

Before you edit any code, create a local branch called "appliers" and push that branch up to the remote location "origin" (which is the github host of your repository).

git checkout main
git pull origin main
git checkout -b appliers
git push -u origin appliers

The first two lines aren't strictly necessary when you don't have any new branches, but it's a good habit to head back to main and sync with "origin" whenever you're transitioning between branches and/or PRs.


Comment on this issue once you've created and pushed the "appliers" branch.

slevin75 commented 3 years ago

ok

github-learning-lab[bot] commented 3 years ago

:keyboard: Activity: Add two new appliers

Code

In _targets.R:

tibble(state_abb = states) %>% 
  mutate(state_plot_files = sprintf("3_visualize/out/timeseries_%s.png", state_abb))

Test

When you're feeling confident, add a comment to this issue with:


I'll respond when I see your comment.

slevin75 commented 3 years ago

image

image

image

github-learning-lab[bot] commented 3 years ago

Check your progress

To help you assess your pipeline, here's what I would have put in that comment:

_ an image from one of the new plots in 3visualize/out*, and

timeseries_WI

_* a printout of the first 10 lines of IL_tally_

> head(tally_IL, 10)
# A tibble: 10 x 4
# Groups:   Site, State [1]
   Site     State  Year NumObs
   <chr>    <chr> <dbl>  <int>
 1 05572000 IL     1908    332
 2 05572000 IL     1909    365
 3 05572000 IL     1910    365
 4 05572000 IL     1911    365
 5 05572000 IL     1912    337
 6 05572000 IL     1914    192
 7 05572000 IL     1915    365
 8 05572000 IL     1916    366
 9 05572000 IL     1917    365
10 05572000 IL     1918    365

_* a copy of the image shown by tarvisnetwork().

visnetwork_image

:keyboard: Activity: Spot the split-apply-combine (again)

Give your answer to the activity in a comment on this issue.


I'll respond when I see your comment.

slevin75 commented 3 years ago

not quite sure what it means by 'isolate the split,apply, and comibine operations'. the group by splits it into different groups by site, state, and year and then it summarizes it by each of those groups.

github-learning-lab[bot] commented 3 years ago

Check your progress

Here's where I think the split-apply-combine paradigm is manifested in tidyverse:

The split is decided here:

group_by(Site, State, Year) %>%

The apply is the expression

length(which(!is.na(Value)))

And both apply and combine are orchestrated by

summarize()

It's amazing how concise these actions can be in tidyverse, don't you think? The targets version would require more code to do the exact same operation, but it brings the special benefit of only (re)building those elements that aren't already up to date.

:keyboard: Activity: Revise and rebuild a step

The timeseries plots aren't meant to be publication quality, but it would be nice to touch them up just a bit.


I'll respond when I see your comment.

slevin75 commented 3 years ago

image

github-learning-lab[bot] commented 3 years ago

Check your progress

_Copy your console output from the tar_make() you just ran_

Do you get something like this, where only six targets were rebuilt?

v skip target oldest_active_sites
v skip target nwis_inventory_WI
v skip target nwis_inventory_IL
v skip target nwis_inventory_IN
v skip target nwis_inventory_MI
v skip target site_map_png
v skip target nwis_inventory_MN
v skip target nwis_inventory_IA
v skip target nwis_data_WI
v skip target nwis_data_IL
v skip target nwis_data_IN
v skip target nwis_data_MI
v skip target nwis_data_MN
v skip target nwis_data_IA
v skip target tally_WI
* start target timeseries_png_WI
  Plotting data for WI-04073500
* built target timeseries_png_WI
v skip target tally_IL
* start target timeseries_png_IL
  Plotting data for IL-05572000
* built target timeseries_png_IL
v skip target tally_IN
* start target timeseries_png_IN
  Plotting data for IN-03373500
* built target timeseries_png_IN
* start target timeseries_png_MI
  Plotting data for MI-04063522
* built target timeseries_png_MI
v skip target tally_MI
* start target timeseries_png_MN
  Plotting data for MN-05211000
* built target timeseries_png_MN
v skip target tally_MN
* start target timeseries_png_IA
  Plotting data for IA-05420500
* built target timeseries_png_IA
v skip target tally_IA
* end pipeline

Copy ... one of the updated plots as a comment on this issue

I edited ggtitle(site_data$Site[1]) to be ggtitle(sprintf("%s-%s", site_data$State[1], site_data$Site[1])), so my updated plot looks like

updated_wi_plot

:keyboard: Activity: Merge your new appliers

Now that we've added these new appliers and thoroughly tested them, your code is ready for a pull request. Go for it!


I'll respond when I see your PR.

slevin75 commented 3 years ago

image