Appliers - Githubissues

github-learning-lab[bot] commented 3 years ago

Your pipeline is looking pretty good! Now it's time to add complexity. I've just added these two files to the repository:

2_process/src/tally_site_obs.R
3_visualize/src/plot_site_data.R

In this issue you'll add these functions to the branching code in the form of two new steps.

Background

The goal of this issue is to expose you to multi-step branching. They're not hugely different from the single-step branching we've already implemented, but there are a few details that you haven't seen yet. The changes you'll make for this issue will also set us up to touch on some miscellaneous pipeline topics. Briefly, we'll cover:

The syntax for adding multiple steps
How to declare dependencies among steps within branching
A quick look at / review of split-apply-combine with the lightweight dplyr syntax

Reminder about dynamic vs static branching

Remember that both dynamic and static branching can have multiple steps or appliers, but they are defined differently. We will focus on static branching for now, but remember that you can always reference the information for dynamic and static branching in their respective locations in the targets user guide (dynamic branching documentation; static branching documentation).

Adding another step in static branching

As you already know, static branching is set up using the tar_map() function, where task targets are defined by the argument values as either a list or data.frame and steps are defined by a call to tar_target() as an additional argument. Up until now, your static branching code in _targets.R, looks something like

tar_map(
  values = tibble(state_abb = states),
  tar_target(nwis_inventory, filter(oldest_active_sites, state_cd == state_abb)),
  tar_target(nwis_data, get_site_data(nwis_inventory, state_abb, parameter))
  # Insert step for tallying data here
  # Insert step for plotting data here
)

We actually already have more than one step in our branching setup - nwis_inventory and nwis_data. This shows that you can include additional calls to tar_target() to add more appliers to your branches. If you want to use a previous step's output, just use the target name from that step and targets will appropriately pass only the output relevant to each task target between the steps within tar_map(). We are going to add a few more steps to our static branching and there is already a hint for where we will add these ... ahem #Insert step for tallying data here and #Insert step for plotting data here ahem.

Steps that require additional info per task

So far, we have used functions in static branching that only needed our state abbreviation, e.g. "WI" or "IL". What happens when we want to have other information used per task? For example, we need to save files per task and we want those to be passed into our step function. Easy! We can just edit the information we pass in for values. Currently, we are using a single-column tibble but that can easily have multiple columns and those columns can be used as arguments to tar_target() commands within tar_map(). We will try this out next!

github-learning-lab[bot] commented 3 years ago

:keyboard: Activity: Switch to a new branch

Before you edit any code, create a local branch called "appliers" and push that branch up to the remote location "origin" (which is the github host of your repository).

git checkout main
git pull origin main
git checkout -b appliers
git push -u origin appliers

The first two lines aren't strictly necessary when you don't have any new branches, but it's a good habit to head back to main and sync with "origin" whenever you're transitioning between branches and/or PRs.

Comment on this issue once you've created and pushed the "appliers" branch.

slevin75 commented 3 years ago

ok

github-learning-lab[bot] commented 3 years ago

:keyboard: Activity: Add two new appliers

Code

In _targets.R:

[x] Add a new step right after nwis_data. It should create R object targets called tally_WI, tally_MI, etc., should call the tally_site_obs() function (also already defined for you in 2_process/src/tally_site_obs.R), and should make use of the targets created in nwis_data.
[x] We are about to add a fourth step to our static branching, where we create plot image files. First, we want to add our new file names to our branches defined in values. Add another column to the tibble called state_plot_files that includes 3_visualize/out/timeseries_WI.png, 3_visualize/out/timeseries_MN.png, etc. by editingtibble(state_abb = states) to be

tibble(state_abb = states) %>% 
  mutate(state_plot_files = sprintf("3_visualize/out/timeseries_%s.png", state_abb))

[x] Add a fourth step to plot the data. This step should have a target name of timeseries_png, should call the plot_site_data() function (defined in 3_visualize/src/plot_site_data.R), should use the image filename for each task stored in the state_plot_files column, and should make use of the targets created in nwis_data (no need to link to the tally targets).
[ ] Make the two new function files (where plot_site_data() and tally_site_obs() are defined) available to the pipeline by adding source() calls to _targets.R.
[x] The tally_site_obs() function uses a function from the package lubridate. Add this package to the packages argument in your tar_option_set() call.
[x] Speaking of packages, we added %>% and mutate to _targets.R in order to add a new column to our task tibble. These are made available by the dplyr package which is included in tidyverse, and while tidyverse is loaded in tar_option_set(), it is not loaded when the top-level makefile is run. So, we need to add library(dplyr) to the top of _targets.R. One common practice is to wrap any library(dplyr) and library(tidyverse) calls used at the top of target makefiles with suppressPackageStartupMessages(). This hides those startup messages about masking functions and what pacakges are being attached. If we don't hide them, they will appear every time we run tar_make() which isn't necessary and can sometimes be distracting.
[ ] Now that we have added an additional column in values, we have less certainty about what tar_map() will use as the suffix when naming branch targets. To control what is used as the suffix, you can specify what part of values to use by passing in the column name to the names argument within tar_map(). This guarantees that _WI, _MN, etc will be used and not the long image filenames (that could get messy!). Go ahead and add names = state_abb as the final argument to tar_map().

Test

[x] Run tar_make(). Is it building a timeseries plot and a tally object for each state? If not, keep fiddling with your code until you get it to work.
[x] Check the contents of the 3_visualize/out directory and inspect at least one of the plots. How do they look?
[ ] Load the value of tally_IL to a variable of the same name in your global environment (hint: ?tar_load())

When you're feeling confident, add a comment to this issue with:

an image from one of the new plots in 3_visualize/out,
a printout of the first 10 lines of tally_IL, and
a copy of the image shown by tar_visnetwork().

I'll respond when I see your comment.

slevin75 commented 3 years ago

github-learning-lab[bot] commented 3 years ago

Check your progress

To help you assess your pipeline, here's what I would have put in that comment:

_ an image from one of the new plots in 3visualize/out*, and

timeseries_WI

_* a printout of the first 10 lines of IL_tally_

> head(tally_IL, 10)
# A tibble: 10 x 4
# Groups:   Site, State [1]
   Site     State  Year NumObs
   <chr>    <chr> <dbl>  <int>
 1 05572000 IL     1908    332
 2 05572000 IL     1909    365
 3 05572000 IL     1910    365
 4 05572000 IL     1911    365
 5 05572000 IL     1912    337
 6 05572000 IL     1914    192
 7 05572000 IL     1915    365
 8 05572000 IL     1916    366
 9 05572000 IL     1917    365
10 05572000 IL     1918    365

_* a copy of the image shown by tarvisnetwork().

visnetwork_image

:keyboard: Activity: Spot the split-apply-combine (again)

[ ] Check out the code for tally_site_obs(). To strengthen your familiarity with the split-apply-combine paradigm, can you isolate the split, apply, and combine operations within this tidyverse expression?

site_data %>%
mutate(Year = lubridate::year(Date)) %>%
# group by Site and State just to retain those columns, since we're already only looking at just one site worth of data
group_by(Site, State, Year) %>%
summarize(NumObs = length(which(!is.na(Value))))

Give your answer to the activity in a comment on this issue.

I'll respond when I see your comment.

slevin75 commented 3 years ago

not quite sure what it means by 'isolate the split,apply, and comibine operations'. the group by splits it into different groups by site, state, and year and then it summarizes it by each of those groups.

github-learning-lab[bot] commented 3 years ago

Check your progress

Here's where I think the split-apply-combine paradigm is manifested in tidyverse:

The split is decided here:

group_by(Site, State, Year) %>%

The apply is the expression

length(which(!is.na(Value)))

And both apply and combine are orchestrated by

summarize()

It's amazing how concise these actions can be in tidyverse, don't you think? The targets version would require more code to do the exact same operation, but it brings the special benefit of only (re)building those elements that aren't already up to date.

:keyboard: Activity: Revise and rebuild a step

The timeseries plots aren't meant to be publication quality, but it would be nice to touch them up just a bit.

[ ] Revise the title to include the State value from the first row of the site_data object.
[ ] Run tar_make() to build the plots again. Only the targets timeseries_png_WI, timeseries_png_MN, etc should have built. Everything else should have been skipped.
[ ] Copy your console output from the tar_make() you just ran and one of the updated plots as a comment on this issue.

I'll respond when I see your comment.

slevin75 commented 3 years ago

github-learning-lab[bot] commented 3 years ago

Check your progress

_Copy your console output from the tar_make() you just ran_

Do you get something like this, where only six targets were rebuilt?

v skip target oldest_active_sites
v skip target nwis_inventory_WI
v skip target nwis_inventory_IL
v skip target nwis_inventory_IN
v skip target nwis_inventory_MI
v skip target site_map_png
v skip target nwis_inventory_MN
v skip target nwis_inventory_IA
v skip target nwis_data_WI
v skip target nwis_data_IL
v skip target nwis_data_IN
v skip target nwis_data_MI
v skip target nwis_data_MN
v skip target nwis_data_IA
v skip target tally_WI
* start target timeseries_png_WI
  Plotting data for WI-04073500
* built target timeseries_png_WI
v skip target tally_IL
* start target timeseries_png_IL
  Plotting data for IL-05572000
* built target timeseries_png_IL
v skip target tally_IN
* start target timeseries_png_IN
  Plotting data for IN-03373500
* built target timeseries_png_IN
* start target timeseries_png_MI
  Plotting data for MI-04063522
* built target timeseries_png_MI
v skip target tally_MI
* start target timeseries_png_MN
  Plotting data for MN-05211000
* built target timeseries_png_MN
v skip target tally_MN
* start target timeseries_png_IA
  Plotting data for IA-05420500
* built target timeseries_png_IA
v skip target tally_IA
* end pipeline

Copy ... one of the updated plots as a comment on this issue

I edited ggtitle(site_data$Site[1]) to be ggtitle(sprintf("%s-%s", site_data$State[1], site_data$Site[1])), so my updated plot looks like

updated_wi_plot

:keyboard: Activity: Merge your new appliers

Now that we've added these new appliers and thoroughly tested them, your code is ready for a pull request. Go for it!

I'll respond when I see your PR.

slevin75 commented 3 years ago

slevin75 / ds-pipelines-targets-3

Appliers #8

Background

Reminder about dynamic vs static branching

Adding another step in static branching

Steps that require additional info per task

:keyboard: Activity: Switch to a new branch

Comment on this issue once you've created and pushed the "appliers" branch.

:keyboard: Activity: Add two new appliers

Code

Test

I'll respond when I see your comment.

Check your progress

:keyboard: Activity: Spot the split-apply-combine (again)

I'll respond when I see your comment.

Check your progress

:keyboard: Activity: Revise and rebuild a step

I'll respond when I see your comment.

Check your progress

:keyboard: Activity: Merge your new appliers

I'll respond when I see your PR.