Closed ddlawton closed 1 year ago
Just looking at the original file. Many of the samples are there (e.g. https://microbeatlas.org/index.html?action=sample_detail&sid=SRS1056245&rid=SRR2233313) however the coordinates column is left blank.
library(tidyverse)
url <- "https://microbeatlas.org/downloads/samples/samples.env.info"
download.file(url,
destfile = basename(url), method="curl", extra="-k")
dat <- read_delim("samples.env.info",col_names=FALSE) %>%
select(X1,X9) %>%
rename(sample_id = "X1",coordinates = "X9")
dat %>% filter(str_detect("SRS1056245", sample_id))
Dear Douglas,
Thanks for reaching out! I checked a little bit the parsing code, and the lat/lon is actually extracted from the samples.info file in a, how to say, non-trivial way with regex etc.
@jfmrod @MCDanaila perhaps it would be possible to provide a separate file with (sample, geo) info on the download page?
Hope this helps, Gregor
Hi, thanks for pointing that out. Indeed there was a mismatch between the data on the website and the data available for download. I've fixed the discrepancy. The data available for download should now match the one shown on the website.
Thanks!
The coordinates provided (and code to produce the map) in the
samples.env.info
are shown below:There are some weird patterns in the point distribution. Such as the entire western side of the US being underrepresented.
These weird patterns do not match what I can see on the website. For example, here is the distribution of Micrococcales which clearly shows more points in the western United States
Are these points unavailable intentionally or am I able to access these points in a different way?
Thanks!