This repository contains the code and products created for the project on HPV vaccination completion rates depending on several socioeconomic factors, by Kelly Cao and Rachel Robertson.
This project performs exploratory and statistical analysis to determine which geographic and socioeconomic factors play the largest role in HPV completion rates among U.S. teens. The data is gathered from the 2022 NIS Teen Vaccination surveys, which are administered to the parent or guardian of a teenager along with the provider to gather demographic information on the teen in tandem with their vaccination history. The socioeconomic and demographic predictors of HPV vaccination rates that are examined include: Race/ethnicity, state of residence, geographic mobility, income, poverty level, housing status, maternal education, insurance status, language, and facility in which the survey was administered.
This data analysis project uses R, Quarto, Github and a Zotero. It is also assumed that you have a word processor installed (e.g. MS Word or LibreOffice). You need that software stack to make use of this template.
data
folder.code
folder or subfolders.results
folder or subfolders.products
subfolders.renv
folder is automatically generated by the renv
package, and is
used to keep track of packages.README.md
files in those folders for some more information.You may first copy this template using the link to our Github [insert link here]. After copying the repository, you may run the code in this order:
In the 'code' folder, select the 'processing-code' subfolder
Return to the main 'code' folder and select the 'eda-code' subfolder
Return to the main 'code' folder and select the 'analysis-code' subfolder
To view the manuscript
In the products
directory, enter the manuscript
folder.
Manuscript.docx
or opening and rendering Manuscript.qmd
. Either ways will allow you to view a word document of the manuscript. [Insert final predictors found to greatest impact HPV completion rates here]
See the readme
files in each folder for more details.
raw_data
folder. processing_code
folder contains several files that load the raw data, perform a bit of cleaning, and save the result in the processed_data
folder. analysis_code
folder contains several files that load the processed data, do an exploratory analysis, and fit a simple model. These files produce figures and some numeric output (tables), which are saved to the results
folder.results
folder contains code output, includeing figures, tables saved as serialized R data (.Rds
) files, and other outputs (distinct from data because they are generated by code).assets
folder contains static assets like pre-generated schematics from BioRender, bibtex files, csl files, and PDFs of references. These assets are not code-based and are not generated by code.products
folder contains final deliverables, like slides, the manuscript, the supplement, and posters.
manuscript
folder contains a template for a report written as Quarto file. If you access this repository as part of my Modern Applied Data Science course, the sections are guides for your project. If you found your way to this repository outside the course, you might only be interested in seeing how the file pulls in results and references and generates a word document as output, without paying attention to the detailed structure. There is also a sub-folder containing an example template for a supplementary material file.slides
folder contains a basic example of slides made with Quarto.renv
folder is automatically generated by the renv
package and you
should never edit it manually. This folder is used to store information about
the packages you are using in your project.README.md
: this file contains instructions or details about the folder it
is located in. You are reading the project-level README.md
file right now.renv.lock
: a special file in JSON format used to keep a log of which
packages and versions your project uses..gitignore
: this file gives instructions to the version control system,
Git, and tells it which files we do not need to record versions of. Usually
these are various files containing local settings..Rprofile
: whenever you restart the R session, R will source (run all
code in) this script. Right now this is used by renv
to make sure we have
the correct packages and versions installed.This is a Github template repository. The best way to get it and start using it is by following these steps.
Once you got the repository, you can check out the examples by executing them in order. First run the processing code, which will produce the processed data. Then run the analysis scripts, which will take the processed data and produce some results. Then you can run the manuscript, poster and slides example files in any order. Those files pull in the generated results and display them. These files also pull in references from the bibtex
file and format them according to the CSL style.
You can read about keeping track of projects with renv
here.
Basically, whenever you install new packages or update old packages, you need
to run renv::snapshot()
to update the renv.lock
file, which is a list of
packages and versions that the package uses. When you open the R project on a
new computer, you can run renv::restore()
to reinstall all the packages that
you recorded in the renv.lock
file.