malaria-atlas-project / malariaAtlas

An R interface to open-access malaria data, hosted by the Malaria Atlas Project.
https://malariajournal.biomedcentral.com/articles/10.1186/s12936-018-2500-5
Other
42 stars 21 forks source link

fillDHSCoordinates - "These requested datasets are not available from your DHS login credentials" #60

Closed RasmusKlinkJoerg closed 1 year ago

RasmusKlinkJoerg commented 1 year ago

I am trying to download the DHS data from Tanzania.

I run the code:

    username = "rasmusklinkb123456789@gmail.com"
    projectname = "Predicting malaria from spatiotemporal data"
    TANZ_pr_data <- getPR(country = "Tanzania", species = "both")
    TANZ_pr_data_plus <- fillDHSCoordinates(TANZ_pr_data, 
                      email = username,
                                  project = projectname,
                      password_prompt=TRUE)

I get the following error:

  Please enter password in TK window (Alt+Tab)
  Writing your configuration to:
     -> C:\Users\r-kli\AppData\Local/r-kli/rdhs/Cache/rdhs.json

  Downloading DHS data.
  These requested datasets are not available from your DHS login credentials:
  ---
  TZGE52FL.zip, TZGE6AFL.ZIP, TZGE7AFL.zip, TZGE7IFL.ZIP
  ---
  Please request permission for these datasets from the DHS website to be able to download them
  Error in readRDS(geo[[file_name]]) : bad 'file' argument

I have access to the files and can download them from the DHS website.

Are there any restrictions on the format of the username or password or what could the mistake be?

image

mauricio-tki commented 1 year ago

Apologies for the delayed reply. Unfortunately I haven't managed to reproduce the issue, so I'll have to ask you to check some things.

First, as an extra check on the DHS side, would you mind screenshotting the "Download by Single Survey" -> "Tanzania" page. And also double-checking that the username and projectname you used in the R code are identical to the ones on the DHS website.

Assuming that that's all fine, could you run the following script to list all package versions:

print(paste("R", getRversion()))
print("-------------")
for (package_name in sort(loadedNamespaces())) {
    print(paste(package_name, packageVersion(package_name)))
}
RasmusKlinkJoerg commented 1 year ago

The overall Tanzania download page: image

One of the specific files: image image

I have double checked with the username, projectname and password, but still no success.

Packages:

print(paste("R", getRversion())) [1] "R 4.2.2" print("-------------") [1] "-------------" for (package_name in sort(loadedNamespaces())) {

  • print(paste(package_name, packageVersion(package_name)))
  • } [1] "base 4.2.2" [1] "cli 3.6.0" [1] "colorspace 2.1.0" [1] "compiler 4.2.2" [1] "curl 5.0.0" [1] "datasets 4.2.2" [1] "digest 0.6.31" [1] "dplyr 1.1.0" [1] "fansi 1.0.4" [1] "farver 2.1.1" [1] "generics 0.1.3" [1] "getPass 0.2.2" [1] "ggplot2 3.4.0" [1] "glue 1.6.2" [1] "graphics 4.2.2" [1] "grDevices 4.2.2" [1] "grid 4.2.2" [1] "gtable 0.3.1" [1] "httr 1.4.4" [1] "jsonlite 1.8.4" [1] "labeling 0.4.2" [1] "lattice 0.20.45" [1] "lifecycle 1.0.3" [1] "magrittr 2.0.3" [1] "malariaAtlas 1.0.1" [1] "methods 4.2.2" [1] "munsell 0.5.0" [1] "pillar 1.8.1" [1] "pkgconfig 2.0.3" [1] "R6 2.5.1" [1] "rappdirs 0.3.3" [1] "RColorBrewer 1.1.3" [1] "rdhs 0.7.6" [1] "rgdal 1.6.4" [1] "rlang 1.0.6" [1] "scales 1.2.1" [1] "sp 1.6.0" [1] "stats 4.2.2" [1] "storr 1.2.5" [1] "tcltk 4.2.2" [1] "tibble 3.1.8" [1] "tidyselect 1.2.0" [1] "tools 4.2.2" [1] "utf8 1.2.2" [1] "utils 4.2.2" [1] "vctrs 0.5.2" [1] "withr 2.5.0"

Some of the package names are not printed above like gridExtra, I do not know why, I am new to R. But I printed their versions: } [1] "curl 5.0.0" [1] "rgdal 1.6.4" [1] "raster 3.6.14" [1] "sp 1.6.0" [1] "xml2 1.3.3" [1] "grid 4.2.2" [1] "gridExtra 2.3" [1] "httr 1.4.4" [1] "dplyr 1.1.0" [1] "stringi 1.7.12" [1] "tidyr 1.3.0" [1] "methods 4.2.2" [1] "stats 4.2.2" [1] "utils 4.2.2" [1] "rlang 1.0.6" [1] "testthat 3.1.6" [1] "knitr 1.42" [1] "rmarkdown 2.20" [1] "palettetown 0.1.1" [1] "magrittr 2.0.3" [1] "tibble 3.1.8" [1] "rdhs 0.7.6"

I think I have all the imports and Suggests.

mauricio-tki commented 1 year ago

Great, thanks for that. Could you install the development version of malariaAtlas, and try again?

You can do that by installing the devtools package, and then installing malariaAtlas with install_github('https://github.com/malaria-atlas-project/malariaAtlas').

There is a fix related to this issue in there that isn't on CRAN yet.

RasmusKlinkJoerg commented 1 year ago

It works now, thank you very much! I also downloaded devtools::install_github("ropensci/rdhs") And when I run it and get the prompt to choose where to write the files: "

TANZ_pr_data_plus <- fillDHSCoordinates(TANZ_pr_data, email = username, project = projectname, password_prompt=TRUE)

Loading required namespace: rdhs rdhs would like to write to files outisde of your R temporary directory. This is so that your datasets and API calls are cached between R sessions. Do you confirm rdhs to write to files outside your R temporary directry? (Enter 1 or 2)

1: Yes 2: No " Then I have to choose 2 for no.

RasmusKlinkJoerg commented 1 year ago

There are still some entries with NA values, are these just completely confidential or can we get access to them too?

mauricio-tki commented 1 year ago

Glad to hear it's working.

Then I have to choose 2 for no.

I usually pick option 1. It then stores the files in your user directory, and you won't need to redownload the data every time. Apart from that it shouldn't make a difference.

There are still some entries with NA values, are these just completely confidential or can we get access to them too?

For which site_id are you seeing NAs? I assume you mean in the latitude and longitude columns, and in the TZA dataset? I'm not seeing any myself.

RasmusKlinkJoerg commented 1 year ago

Maybe when I choose option 1 it remembers a session where I wrote the credentials wrong.

We have all the latitude and longitude coordinates, but some rows have almost no other data, and have column "Permission info: No permission to release data", like this one: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

4 |   | 8408 | Shaurimoyo | -6.1535 | 39.208 | UNKNOWN | Tanzania | TZA | Africa | NA | NA | NA | NA | NA | NA | NA | NA | NA | Confidential | Microscopy | NA | FALSE | TRUE | No permission to release data | Lusinde, R. and Molteni, F., . (2008) personal communication. -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --

.

But I think we have enough data for now, so it is not a problem, I was just curious.

Again, thanks a lot.

mauricio-tki commented 1 year ago

Maybe when I choose option 1 it remembers a session where I wrote the credentials wrong.

Ah interesting, that might very well be possible. That's something that's handled on the rdhs side. You can delete C:\Users\r-kli\AppData\Local/r-kli/rdhs/Cache/rdhs.json to remove the credentials and it'll recreate the file when connecting the next time.

We have all the latitude and longitude coordinates, but some rows have almost no other data, and have column "Permission info: No permission to release data", like this one: ...

I believe that that data is just completely confidential. It's also not from DHS (see citation column).

Thanks for filing the issue, I'll update the readme to include the github installation instructions, at least until we publish a new version on CRAN.