ropensci / rdefra

rdefra: Interact with the UK AIR Pollution Database from DEFRA
https://docs.ropensci.org/rdefra
16 stars 6 forks source link

Different variables for ukair_get_coordinates() when inputs are fed in differently #7

Closed sebsfox closed 5 years ago

sebsfox commented 7 years ago

Hi there,

Depending on how I feed the Site ID into the function ukair_get_coordinates() I get a data.frame of different width. Two methods are as follows:

Set up

library('rdefra')
stations_raw <- ukair_catalogue()

Method 1

stations <- data.frame()
for (i in stations_raw$UK.AIR.ID[!is.na(stations_raw$EMEP.Site.ID)]){
        stations <- rbind(stations, ukair_get_coordinates(i))
}

Method 2

stations_raw <- ukair_catalogue()
stations_all <- ukair_get_coordinates(stations_raw[!is.na(stations_raw$EMEP.Site.ID),])

Method 1 returns a data.frame with 5 variables, and Method 2 returns a data.frame with 16 variables (and Eastings and Northings are generally incomplete).

Thank you

cvitolo commented 5 years ago

Hi @sebsfox! The function ukair_get_coordinates(), as the name suggests, only tries to get the coordinates for the stations ids. If you feed ukair_get_coordinates() with a character vector containing only identification numbers, you get a data.frame with identification numbers and coordinates. If you feed ukair_get_coordinates() with a dataframe containing a column with identification numbers, you get the same data.frame in which missing coordinates have been infilled (if available).

Of course you might still get NAs, this happens for stations that do not have coordinates available of the relevant ukair webpage.

Hope this clarifies everything.

I'm closing this issue, as this is the intended behaviour.