sspaulding777 / ds-pipelines-1

https://lab.github.com/USGS-R/intro-to-pipelines
0 stars 0 forks source link

Organize your project files #3

Open github-learning-lab[bot] opened 3 years ago

github-learning-lab[bot] commented 3 years ago

You should organize your code into functions, targets, and conceptual "phases" of work.

Often we create temporary code or are sent scripts that look like my_work_R/my_happy_script.R in this repository. Take a minute to look through that file now.

This code has some major issues, including that it uses a directory that is specific to a user, it plots to a non-project file location, and the structure of the code makes it hard to figure out what is happening. This simple example is a starting point for understanding the investments we make to move towards code that is more reproducible, more shareable, and understandable. Additionally, we want to structure our code and our projects in a way where we can build on top of them as the projects progress.

Assign this issue to yourself and then we'll get started on code and project structures.


I'll sit patiently until you've assigned yourself to this one.

github-learning-lab[bot] commented 3 years ago

We're going to combine several things into this assignment. You'll be asked to make some modifications to your training repository and also to create a pull request that captures these changes.

Background

We use team conventions for how our pipelines are organized, which make it easier to hop in and out of collaborative projects and to rapidly understand what is going on where.

We refer to major elements of a pipeline as "phases", and name phases according to their purpose, such as 1_fetch or 2_process. These phases are used to separate files and data based on the intent of the code we are writing, and make it tractable to figure out where you'd need to edit code if you were coming in fresh to the project.

For medium to large pipelines projects, you'll see these workflow phases explicitly named by a number often followed by a verb (separated by an underscore). We use these phases to create different folders :file_folder: for data and code, and also to specify how we orchestrate the running of code (more on that later).

So, if we have a 1_fetch phase, code in the fetch folder :file_folder: would be used to do things like get data from web services, google drive, an FTP, or to scrape a website. 2_process (or 2_munge) might contain code that transforms the "fetched" data into more usable formats.

We recommend having src and out folders within each phase folder that contain code for this phase (src) and data (or other files) produced by this phase (out). When seeing some of our existing pipelines in action, you will also see other folders :file_folder: named in, log, and tmp to represent manually added files, logged/diagnostic output, and temporary data files, respectively.

:keyboard: Activity: Restructure your code repository to follow our team's conventions for folders and files

Create a two phase directory structure for "fetch" and "process" concepts, and include src and out folders. Move the example script (my_happy_script.R) from the my_work_R folder into one of the src folders (at this time, it doesn't matter which one you choose) and delete any existing folders that aren't part of the intended structure.

When you are done, open a pull request with the changes.


Check your new pull request for a comment from me (you might have to wait a few seconds).

sspaulding777 commented 3 years ago

Hello! Thank you for making this tutorial, it is exactly what I need. I the lesson above, I can make a new file, but I don't see how to make a new folder.

aappling-usgs commented 3 years ago

Hi! Glad you're excited about the tutorial. However, I have to let you know that we created this tutorial to support training within our small team, and we don't have the bandwidth to provide human support or feedback beyond our team. Sorry about that, and I hope you're able to find benefit by working through the materials on your own.