The objective of this summer research is to develop, expand, and polish a repository for metadata for broadly useful datasets.
A GitHub repository which contains the following:
An R project which contains all files relevant to the metadata repository.
A log of hours spent on the summer research by the student, which includes date, hours, and activity summary.
A presentation to the Statistics Department.
A presentation at the CSM annual research conference.
A manuscript to submit to a journal to be determined.
1. Utilize GitHub to collaborate on project materials and updates.
Karl Broman's github tutorial
Jenny Bryan's Happy git with R.
Also check out using version control with RStudio and this video on Git and RStudio.
2. Adhere to good programming practices.
Write all R code according to Hadley Wickam's Style Guide.
Use the tidyverse style guide for an additional reference.
Use Hadley Wickham's R for Data Science book as a reference (Ch19 also discusses functions).
3. Scrape various websites and use various website APIs to collect metadata.
Harvard's Dataverse
Data Dryad (https://cran.r-project.org/web/packages/rdryad/rdryad.pdf)(https://datadryad.org/api/v2/docs/)
UCI Machine Learning Repository
CORGIs Data Repository
data.world
datahub.io
California Open Data Portal: https://data.ca.gov/group
data.gov (https://www.data.gov/)
zenodo.org
BCO-DMO (https://www.bco-dmo.org/search/dataset)
LIST OF DATA REPOSITORIES (http://oad.simmons.edu/oadwiki/Data_repositories)
4. Organize metadata into single dataset
Establish full set of variables/columns/features in metadata.
Establish best form for metadata dataset.
Metadata Target Fields/Observed Fields: https://docs.google.com/spreadsheets/d/1CidGQw74Y1an9Z5JuayYCFkQor5zSMwDZR952OuQE2g/edit?usp=sharing
5. Understand/Learn about data storage, curation, and indexing.
What is Data Curation? (https://www.dataversity.net/what-is-data-curation/)
Data Curation 101 (https://www.dataversity.net/what-is-data-curation/)
Data Curation Network (https://datacurationnetwork.org/)
Wikipedia Data Curation (https://en.wikipedia.org/wiki/Data_curation)
What is Metadata? (https://www.opendatasoft.com/blog/2016/08/25/what-is-metadata-and-why-is-it-important-data)
Digital Curation Centre (https://www.dcc.ac.uk/guidance/standards/metadata/list)
University of Pittsburgh: Metadata Standards (https://pitt.libguides.com/metadatadiscovery/metadata-standards)
Research Data Alliance Metadata Standards (https://rd-alliance.github.io/metadata-directory/standards/)
Cornell Metadata Standards (https://data.research.cornell.edu/content/writing-metadata)
DataHub Application: https://engineering.linkedin.com/blog/2019/data-hub
**Activity Log
https://docs.google.com/document/d/1SzroBCSzDsLGObtMDmymxTMAX_J-zrTDHAbVTWgzHCg/edit?usp=sharing