Closed lukas-rokka closed 8 years ago
thanks, having a look ...
@lukas-rokka Can you explain in english what the calculation of grid_size_lat
and grid_size_lon
are? And would you be able to do a bounding box search with this setup?
At the equator the size of 1° is the same both in longitude and latitude, roughly 111 km. At 90° latitudes the size of 1° longitude is zero, while the latitude is relatively constant. At 60°, the grid size is roughly 111 km in latitude and 66 km in longitude. I don't remember the reference to the trigonometric calculations, but I think it's an approximation that accounts for earth being a spheroid (not a sphere).
The dist=sqrt(((loc_to_search[1] - lon)*grid_size_lon)^2 + ((loc_to_search[2] - lat)*grid_size_lat)^2))
calculates the distance in meters between two coordination's points. It is fairly accurate as long as the two points are not too far away from each other. It calculates the grid size at the position of the stations.
Here an example that calculates the grid size as the average of the station and the search point, which gives a more accurate result for points far apart.
library(rnoaa)
library(geosphere)
library(dplyr)
library(leaflet)
stations <- rnoaa::isd_stations() %>%
filter(!(is.na(lat) | is.na(lon) | (lat==0 & lon==0) | abs(lon) > 180 | abs(lat) > 90))
add_distances <- function(stations, search_lat, search_lon, max_dist) {
search_lat_rad <- search_lat*pi/180
grid_size_search_lat <- 111132.954 - 559.822*cos(2*search_lat_rad) + 1.175*cos(4*search_lat_rad) # 1° lat size at search lat, in m
grid_size_search_lon <- grid_size_search_lat*cos(search_lat_rad) # 1° lon size at search lat, in m
stations %>%
mutate(lat_rad = lat*pi/180,
grid_size_lat = 111132.954 - 559.822*cos(2*lat_rad) + 1.175*cos(4*lat_rad), # 1°lat size at stations lat
grid_size_lon = grid_size_lat*cos(lat_rad), # 1° lon size at stations lat
grid_size_lat = (grid_size_lat + grid_size_search_lat)/2, # mean of search and stations lat
grid_size_lon = (grid_size_lon + grid_size_search_lon)/2, # mean of search and stations lon
dist=0.001*sqrt(((search_lon - lon)*grid_size_lon)^2 + ((search_lat - lat)*grid_size_lat)^2)) %>% # the distance in km
select(-lat_rad, -grid_size_lat, -grid_size_lon) %>%
filter(dist <= max_dist) %>%
return()
}
# get data frame of stations in the 200 km radius of the seach lat/lon.
# Takes less than 0.05 sec on my computer
nearest_st <- stations %>% add_distances(.,60,18,200)
# plot them on map
leaflet() %>%
addTiles() %>%
addCircles(lng = nearest_st$lon,
lat = nearest_st$lat,
popup = nearest_st$station_name) %>%
clearBounds()
for the bounding box calculation grid sizes shouldn't be an issue? Should be simple as:
# get stations within 50-70 latitudes and 0-20 longitudes
bbox_st <- stations %>% filter(lat>=50 & lat<=70, lon>=0 & lon <=20)
leaflet() %>% addTiles() %>%
addCircles(lng= bbox_st$lon,
lat = bbox_st$lat,
popup = bbox_st$station_name) %>%
clearBounds()
You don't have an interface to the isd-lite, right? (ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite) I might do an lightweight interface to that, with async downloading, as I need an implementation that gets the most important weather parameters for multi-year periods fairly efficiently. Is there an implementation in the rOpenSci that deals with imputing missing weather parameters? What I want is to be able to download data from a certain station and if there are longer periods of missing data I would like to impute them from nearby stations.
Thanks for the details on the what the calculations do. I can write a draft replacement function with what you've done on a new branch and get your feedback on it?
You don't have an interface to the isd-lite, right?
no, but someone mentioned it here too https://github.com/ropensci/rnoaa/issues/76#issuecomment-104121544 - happy to incorporate here
async
there are async features in the curl
package
Is there an implementation in the rOpenSci that deals with imputing missing weather parameters?
No, be interested to add that if it's not a ton of code
I can write a draft replacement function with what you've done on a new branch and get your feedback on it?
Sure, that would be great.
No, be interested to add that if it's not a ton of code
Well, it could easily end up being "tons of code". But would be a very useful feature. There are some packages (e.g. MICE, hmisc::aregImpute()) that deals with missing values. But I think there are some domain specific issues that can take while to deal with, like which nearby stations that can be used (the nearest ones, does it need to be on same elevation or some other criteria?)
I've a Shiny app, that uses European gridded weather data sources (guess some of them could be interesting for rOpenSci as well, like CAMS for solar radiations data) and turns the data into time series formats useable for building simulations tools. It's for this tool I like to use the isd data.
Sure, that would be great.
okay, will start that
imputing missing weather parameters
Can you open up a new issue for this, and we can continue discussion there
European gridded weather data sources (guess some of them could be interesting for rOpenSci as well, like CAMS for solar radiations data)
This pkg is focusing on NOAA data - I assume those data sources aren't from NOAA? We can definitely start one or more new pkgs though
@lukas-rokka try out the new fxn, reinstall from here and let me know what you think
@sckott it tested it for functionality and speed, it works great.
A comment of overlapping: there's a function meteo_spherical_distance() (from the latest pull?) in meteo_distance.R that calculates the distance with the haversine forumula (slightly more elegant than my version that projects our round earth to a flat map and then do the distance calc). You can now search the nearest ISD stations with new functionalities from meteo_distance.R as well:
# using meteo_distance.R
isd_stations() %>%
rename(latitude=lat, longitude=lon) %>%
meteo_distance(60, 18, radius = 200) %>%
filter(distance < 50))
# Returns the same result as
isd_stations_search(60, 18, 50) %>%
arrange(dist)
Both works fine and the performance is the same. But would maybe be cleaner to make use of meteo_distance.R? And not so good with different variable name conventions within the package (latitude=lat, longitude=lon, distance=dist).
@lukas-rokka good point. yes those fxns are from the latest PR. We should keep isd_stations_search
, but we could use meteo_distance
internally within that fxn to do the actual work. sound good?
It's up to you @sckott, and what you think gives more maintainable code. From an user perspective it doesn't really matter how it works internally as long it works. But good to keep the isd_stations_search
as it will come up when you start typing isd_
@lukas-rokka made the changes, reinstall and try again
Using is_stations_search() takes roughly 20 seconds. Which is too long to be useful in production code.
Here is solution that I used my self that takes less than 0.1 second to compute. I gets the nearest stations with a reasonable accuracy, the distance and don't require loading external packages.