Closed github-learning-lab[bot] closed 3 years ago
Before you edit any code, create a local branch called "appliers" and push that branch up to the remote location "origin" (which is the github host of your repository).
git checkout main
git pull origin main
git checkout -b appliers
git push -u origin appliers
The first two lines aren't strictly necessary when you don't have any new branches, but it's a good habit to head back to main
and sync with "origin" whenever you're transitioning between branches and/or PRs.
ok
In _targets.R:
[x] Add a new step right after nwis_data
. It should create R object targets called tally_WI
, tally_MI
, etc., should call the tally_site_obs()
function (also already defined for you in 2_process/src/tally_site_obs.R), and should make use of the targets created in nwis_data
.
[x] We are about to add a fourth step to our static branching, where we create plot image files. First, we want to add our new file names to our branches defined in values
. Add another column to the tibble called state_plot_files
that includes 3_visualize/out/timeseries_WI.png, 3_visualize/out/timeseries_MN.png, etc. by editingtibble(state_abb = states)
to be
tibble(state_abb = states) %>%
mutate(state_plot_files = sprintf("3_visualize/out/timeseries_%s.png", state_abb))
[x] Add a fourth step to plot the data. This step should have a target name of timeseries_png
, should call the plot_site_data()
function (defined in 3_visualize/src/plot_site_data.R), should use the image filename for each task stored in the state_plot_files
column, and should make use of the targets created in nwis_data
(no need to link to the tally
targets).
[ ] Make the two new function files (where plot_site_data()
and tally_site_obs()
are defined) available to the pipeline by adding source()
calls to _targets.R
.
[x] The tally_site_obs()
function uses a function from the package lubridate. Add this package to the packages
argument in your tar_option_set()
call.
[x] Speaking of packages, we added %>%
and mutate
to _targets.R
in order to add a new column to our task tibble. These are made available by the dplyr
package which is included in tidyverse
, and while tidyverse
is loaded in tar_option_set()
, it is not loaded when the top-level makefile is run. So, we need to add library(dplyr)
to the top of _targets.R
. One common practice is to wrap any library(dplyr)
and library(tidyverse)
calls used at the top of target makefiles with suppressPackageStartupMessages()
. This hides those startup messages about masking functions and what pacakges are being attached. If we don't hide them, they will appear every time we run tar_make()
which isn't necessary and can sometimes be distracting.
[ ] Now that we have added an additional column in values
, we have less certainty about what tar_map()
will use as the suffix when naming branch targets. To control what is used as the suffix, you can specify what part of values
to use by passing in the column name to the names
argument within tar_map()
. This guarantees that _WI
, _MN
, etc will be used and not the long image filenames (that could get messy!). Go ahead and add names = state_abb
as the final argument to tar_map()
.
[x] Run tar_make()
. Is it building a timeseries plot and a tally
object for each state? If not, keep fiddling with your code until you get it to work.
[x] Check the contents of the 3_visualize/out directory and inspect at least one of the plots. How do they look?
[ ] Load the value of tally_IL
to a variable of the same name in your global environment (hint: ?tar_load()
)
When you're feeling confident, add a comment to this issue with:
tally_IL
, andtar_visnetwork()
.To help you assess your pipeline, here's what I would have put in that comment:
_ an image from one of the new plots in 3visualize/out*, and
_* a printout of the first 10 lines of IL_tally
_
> head(tally_IL, 10)
# A tibble: 10 x 4
# Groups: Site, State [1]
Site State Year NumObs
<chr> <chr> <dbl> <int>
1 05572000 IL 1908 332
2 05572000 IL 1909 365
3 05572000 IL 1910 365
4 05572000 IL 1911 365
5 05572000 IL 1912 337
6 05572000 IL 1914 192
7 05572000 IL 1915 365
8 05572000 IL 1916 366
9 05572000 IL 1917 365
10 05572000 IL 1918 365
_* a copy of the image shown by tarvisnetwork().
tally_site_obs()
. To strengthen your familiarity with the split-apply-combine paradigm, can you isolate the split, apply, and combine operations within this tidyverse expression?
site_data %>%
mutate(Year = lubridate::year(Date)) %>%
# group by Site and State just to retain those columns, since we're already only looking at just one site worth of data
group_by(Site, State, Year) %>%
summarize(NumObs = length(which(!is.na(Value))))
Give your answer to the activity in a comment on this issue.
not quite sure what it means by 'isolate the split,apply, and comibine operations'. the group by splits it into different groups by site, state, and year and then it summarizes it by each of those groups.
Here's where I think the split-apply-combine paradigm is manifested in tidyverse:
The split is decided here:
group_by(Site, State, Year) %>%
The apply
is the expression
length(which(!is.na(Value)))
And both apply
and combine
are orchestrated by
summarize()
It's amazing how concise these actions can be in tidyverse, don't you think? The targets version would require more code to do the exact same operation, but it brings the special benefit of only (re)building those elements that aren't already up to date.
The timeseries plots aren't meant to be publication quality, but it would be nice to touch them up just a bit.
[ ] Revise the title to include the State
value from the first row of the site_data
object.
[ ] Run tar_make()
to build the plots again. Only the targets timeseries_png_WI
, timeseries_png_MN
, etc should have built. Everything else should have been skipped.
[ ] Copy your console output from the tar_make()
you just ran and one of the updated plots as a comment on this issue.
_Copy your console output from the tar_make()
you just ran_
Do you get something like this, where only six targets were rebuilt?
v skip target oldest_active_sites
v skip target nwis_inventory_WI
v skip target nwis_inventory_IL
v skip target nwis_inventory_IN
v skip target nwis_inventory_MI
v skip target site_map_png
v skip target nwis_inventory_MN
v skip target nwis_inventory_IA
v skip target nwis_data_WI
v skip target nwis_data_IL
v skip target nwis_data_IN
v skip target nwis_data_MI
v skip target nwis_data_MN
v skip target nwis_data_IA
v skip target tally_WI
* start target timeseries_png_WI
Plotting data for WI-04073500
* built target timeseries_png_WI
v skip target tally_IL
* start target timeseries_png_IL
Plotting data for IL-05572000
* built target timeseries_png_IL
v skip target tally_IN
* start target timeseries_png_IN
Plotting data for IN-03373500
* built target timeseries_png_IN
* start target timeseries_png_MI
Plotting data for MI-04063522
* built target timeseries_png_MI
v skip target tally_MI
* start target timeseries_png_MN
Plotting data for MN-05211000
* built target timeseries_png_MN
v skip target tally_MN
* start target timeseries_png_IA
Plotting data for IA-05420500
* built target timeseries_png_IA
v skip target tally_IA
* end pipeline
Copy ... one of the updated plots as a comment on this issue
I edited ggtitle(site_data$Site[1])
to be ggtitle(sprintf("%s-%s", site_data$State[1], site_data$Site[1]))
, so my updated plot looks like
Now that we've added these new appliers and thoroughly tested them, your code is ready for a pull request. Go for it!
Your pipeline is looking pretty good! Now it's time to add complexity. I've just added these two files to the repository:
In this issue you'll add these functions to the branching code in the form of two new steps.
Background
The goal of this issue is to expose you to multi-step branching. They're not hugely different from the single-step branching we've already implemented, but there are a few details that you haven't seen yet. The changes you'll make for this issue will also set us up to touch on some miscellaneous pipeline topics. Briefly, we'll cover:
Reminder about dynamic vs static branching
Remember that both dynamic and static branching can have multiple steps or appliers, but they are defined differently. We will focus on static branching for now, but remember that you can always reference the information for dynamic and static branching in their respective locations in the targets user guide (dynamic branching documentation; static branching documentation).
Adding another step in static branching
As you already know, static branching is set up using the
tar_map()
function, where task targets are defined by the argumentvalues
as either a list or data.frame and steps are defined by a call totar_target()
as an additional argument. Up until now, your static branching code in_targets.R
, looks something likeWe actually already have more than one step in our branching setup -
nwis_inventory
andnwis_data
. This shows that you can include additional calls totar_target()
to add more appliers to your branches. If you want to use a previous step's output, just use the target name from that step and targets will appropriately pass only the output relevant to each task target between the steps withintar_map()
. We are going to add a few more steps to our static branching and there is already a hint for where we will add these ... ahem#Insert step for tallying data here
and#Insert step for plotting data here
ahem.Steps that require additional info per task
So far, we have used functions in static branching that only needed our state abbreviation, e.g. "WI" or "IL". What happens when we want to have other information used per task? For example, we need to save files per task and we want those to be passed into our step function. Easy! We can just edit the information we pass in for
values
. Currently, we are using a single-columntibble
but that can easily have multiple columns and those columns can be used as arguments totar_target()
commands withintar_map()
. We will try this out next!