Open alexkrohn opened 5 months ago
Interesting. We tend to use this package internally in a similar fashion to you example. I tested your example and consistently got returns of around 0.3 secs/species. It's interesting that its performing significantly slower for you.
This package/function was built for single/few species queries. I'll take a look for an alternative solution.
Interesting! I wonder why the lag time is so high for me.
I've been getting it to work by:
1) Parallelizing the calls to ns_search
to do many at once.
2) Grouping the df by species and state to only call unique combinations from ns_search
.
I'm very curious if there is a better way.
Secondarily, I've noticed that the results from ns_search
sometimes differ from what is displayed on the NatureServ website.
For example:
nat.serv.result <- ns_search_spp(text = "Ameiurus natalis")
lapply(ns.result$results$nations, function(x){
x %>%
select(subnations) %>%
do.call(bind_rows,.)
}) %>%
do.call(bind_rows, .) %>%
# Finally, keep only the relevant state and pull the status
filter(subnationCode == "FL")
# subnationCode roundedSRank exotic native
# 1 FL S4 FALSE TRUE
# 2 FL S3 FALSE TRUE
However, looking on NatureServe, the map shows the Yellow Bullhead as not ranked. Any idea why that might be, and why there are two rankings? (Sorry for asking two questions in one issue!)
I was going to suggest parallelization as a short term solution -- I'm glad its working out. We'll take a look at ways to increase the query speed, but honestly it will be sometime in July before we can get to that given current capacity.
On the Yellow bullhead. I noticed there may be a type in your code on line 3:
Should lapply(ns.result$results$nations, function(x){
read as lapply(nat.serv.result$results$nations, function(x){
instead?
Making that correction and running your code reports back the correct rank of SNR for FL.
D'oh! That's correct and helpful. Thanks! I look forward to hearing if you figure out ways to speed things along.
@alexkrohn one alternative way to query this faster is to use the ns_id("ELEMENT_GLOBAL.2.154701") function as this links directly to the record instead of searching for matches to your species query. The main issue is that you need the EGT_UID (e.g. "ELEMENT_GLOBAL.2.154701") which isn't readily published. However, you could go to NatureServe Explorer, do an advanced query for a particular taxonomic group such as vertebrates, export the xls file, and then do some text processing on the url field within the xls to get the EGT_UIDs. Then you just need to match those up to your species list. It would be a little upfront work, but the the response time on ns_id() is really fast.
Hi there.
I have a data frame with thousands of species-level detections from various US states. For each taxon in each state, I'd like to query NatureServ to extract the state-level status.
Querying each taxon individually with
ns_search_spp
is very slow. What is the best practice to query multiple taxa at once?Simple example:
That is ~1 second per species. Is there a faster/better way to do this if I have thousands of species-state combinations?
This ignores that there are multiple
nations
for Thamnophis sirtalis, including multiple entries for the US, so there is probably a better way to just find the entries fromsubnationCode == "NJ"
. I haven't figured out the best way to expand out those nested dfs from the list...