weecology / updating-data

Hugo website for instructions on how to make a regularly updating data pipeline
https://www.updatingdata.org/
MIT License
0 stars 2 forks source link

Add example for Add Data Code #13

Open rudeboybert opened 3 years ago

rudeboybert commented 3 years ago

@ethanwhite Currently the Add Data Code page doesn't have an example of data manipulation in datascript.R. I could very quickly add this example to the datascript.R which takes the original raw data and writes a "post-processed" csv file:

print("Have this script run whatever data cleaning you do")
library(dplyr)

base_data <- read.csv('data-raw/data.csv',
                      stringsAsFactors = F)

base_data %>%
  mutate(SH_plus_SO = SH + SO) %>%
  write.csv("data-raw/data_post_processed.csv")

This would require explaining how to modify the .github/workflows/R-CMD-check.yaml file to allow GitHub Actions to add Commit files and Push changes jobs to push to the new csv file to the repo (I modified this code to do so):

...
      - name: Run tests
        run: Rscript testthat.R
      - name: Run datascript
        run: Rscript datascript.R
      - name: Commit files
        run: |
          git config --local user.email "41898282+github-actions[bot]@users.noreply.github.com"
          git config --local user.name "github-actions[bot]"
          git add data-raw/data_post_processed.csv
          git commit -m "Add post processed data"
      - name: Push changes
        uses: ad-m/github-push-action@master
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          branch: ${{ github.ref }}

Thoughts?

ethanwhite commented 3 years ago

Sorry to be so slow on this. I think this is a great idea and would really add to the documentation!