ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

epair: Use R to get data from the Environmental Protection Agency Air Quality System API #415

Closed GLOrozcoM closed 3 years ago

GLOrozcoM commented 3 years ago

epair aids users in getting pollutant data from the Environmental Protection Agency Air Quality System API.

Package: epair
Title: Grabs data from EPA API, simplifies getting pollutant data
Version: 0.1.0
Authors@R: person("G.L.", "Orozco-Mulfinger", email = "glo003@bucknell.edu", role = c("aut", "cre"))
Description: 
  A package to aid the user in making queries to the EPA API site found at https://aqs.epa.gov/aqsweb/documents/data_api.
  It combines API calling methods from various web scraping packages with specific strings to retrieve data from the EPA API. It also contains
  easy to use loaded variables that help a user navigate services offered by the API and aid the user in 
  determining the appropriate way to make a an API call.
Depends: R (>= 3.3.3)
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
Imports:
  xml2 (>= 1.1.1),
  rvest (== 0.3.5), 
  httr (>= 1.4.1), 
  jsonlite (>= 1.6.1)
Suggests: testthat

https://github.com/GLOrozcoM/epair

This data source is the Environmental Protection Agency’s Air Quality System API (https://aqs.epa.gov/aqsweb/documents/data_api.html). This API records and maintains air quality data from a variety of sources and on various spatial and temporal domains. As an example, the API can provide ozone concentrations at the county level for an hourly resolution in a particular state of the US.

The EPA AQS API is crucial for studies that require pollutant data within the US. Researchers from diverse domains including statistics, environmental sciences, environmental health, climate change, physics, atmospheric sciences, and epidemiology (to name just a few) all use EPA AQS API pollutant data to conduct their studies.

As an example, certain researchers (Gilani, Urbanek, & Kane, 2020) recently used this API and package to model ozone concentrations in Connecticut, USA and currently use this data source for other research projects exploring the impact of COVID-19 on air pollution concentrations.

The target audience is air quality researchers in general who are not necessarily advanced R users. As mentioned earlier, these data are used by researchers from a diverse range of disciplines interested in monitoring and modeling air pollution concentrations in the US. These disciplines include statistics, environmental health, air pollution, climatology, epidemiology, economics, public health, geoscience, atmospheric science, to name a few. These data are frequently used by federal agencies, academic researchers, and industry users.

The current method of downloading these data from the AQS API requires users to build their own API calls by appending strings together. However, to accurately do this requires a deep understanding of the AQS API building system, and some facility with understanding URL construction. This package is aimed at such researchers who are interested in downloading the data but do not want to invest considerable time in learning how to develop correct API calls, as well as those who want to explore the types of data available before actually downloading the data. 

To date, another package (aqsr by jpkeller on GitHub) exists fulfilling similar aims to epair. Both packages perform an httr call to retrieve data from the EPA’s API. A few key differences though, make epair stand out.

The first is that epair provides a substantial number of aids to R users in determining how to make an API call. epair does this through a comprehensive services object to help the user explore EPA API services from R. For instance, besides just listing all available services using names(services), the user can check a description, available filters, endpoints associated with filters, required and optional variables, and examples for these calls. epair also offers the variables object. It contains helpful descriptions for the user to know more about variables within the EPA API.

A second difference lies within documentation. A full PDF manual, thorough documentation for each function, and a full testing suite using testthat has been created in the package for maintenance.

A final new feature within epair is the ability to call raw JSON and R data frames. Through perform.call() and perform.call.raw(), a user can easily get an R data frame or a raw JSON result depending on their needs.

See the following website to see a full tutorial for using the package: https://epair.netlify.app/

melvidoni commented 3 years ago

Hello @GLOrozcoM for your pre-submission enquire. Please, edit your opening post to elaborate in the following points:

Please, be as thorough as possible, and let me know once this is ready.

GLOrozcoM commented 3 years ago

Hello @melvidoni , thank you for your feedback! I've updated the issue with your suggested comments. Please let me know if you would like me to write further on any aspect.

melvidoni commented 3 years ago

Thank you. All the editors will discuss and I'll let you know of the decision.

melvidoni commented 3 years ago

Hello @GLOrozcoM thank you so much for your pre-submission enquiry. All the editors have discussed and found your package to be in-scope. We welcome a full submission.

However, please remember that there will be no editorial activity from Dec 19th to Jan 3rd, as per #417