Open github-learning-lab[bot] opened 3 years ago
We're going to combine several things into this assignment. You'll be asked to make some modifications to your training repository and also to create a pull request that captures these changes.
We use team conventions for how our pipelines are organized, which make it easier to hop in and out of collaborative projects and to rapidly understand what is going on where.
We refer to major elements of a pipeline as "phases", and name phases according to their purpose, such as 1_fetch
or 2_process
. These phases are used to separate files and data based on the intent of the code we are writing, and make it tractable to figure out where you'd need to edit code if you were coming in fresh to the project.
For medium to large pipelines projects, you'll see these workflow phases explicitly named by a number often followed by a verb (separated by an underscore). We use these phases to create different folders :file_folder: for data and code, and also to specify how we orchestrate the running of code (more on that later).
So, if we have a 1_fetch
phase, code in the fetch folder :file_folder: would be used to do things like get data from web services, google drive, an FTP, or to scrape a website. 2_process
(or 2_munge
) might contain code that transforms the "fetched" data into more usable formats.
We recommend having src
and out
folders within each phase folder that contain code for this phase (src
) and data (or other files) produced by this phase (out
). When seeing some of our existing pipelines in action, you will also see other folders :file_folder: named in
, log
, and tmp
to represent manually added files, logged/diagnostic output, and temporary data files, respectively.
Create a two phase directory structure for "fetch" and "process" concepts, and include src
and out
folders. Move the example script (my_happy_script.R
) from the my_work_R
folder into one of the src
folders (at this time, it doesn't matter which one you choose) and delete any existing folders that aren't part of the intended structure.
When you are done, open a pull request with the changes.
Hello! Thank you for making this tutorial, it is exactly what I need. I the lesson above, I can make a new file, but I don't see how to make a new folder.
Hi! Glad you're excited about the tutorial. However, I have to let you know that we created this tutorial to support training within our small team, and we don't have the bandwidth to provide human support or feedback beyond our team. Sorry about that, and I hope you're able to find benefit by working through the materials on your own.
You should organize your code into functions, targets, and conceptual "phases" of work.
Often we create temporary code or are sent scripts that look like my_work_R/my_happy_script.R in this repository. Take a minute to look through that file now.
This code has some major issues, including that it uses a directory that is specific to a user, it plots to a non-project file location, and the structure of the code makes it hard to figure out what is happening. This simple example is a starting point for understanding the investments we make to move towards code that is more reproducible, more shareable, and understandable. Additionally, we want to structure our code and our projects in a way where we can build on top of them as the projects progress.
Assign this issue to yourself and then we'll get started on code and project structures.
I'll sit patiently until you've assigned yourself to this one.