malaria-atlas-project / malariaAtlas

An R interface to open-access malaria data, hosted by the Malaria Atlas Project.
https://malariajournal.biomedcentral.com/articles/10.1186/s12936-018-2500-5
Other
42 stars 21 forks source link

Error message with fillDHSCoordinates #55

Open hanzhang0 opened 3 years ago

hanzhang0 commented 3 years ago

Hello - thank you for creating this great package! I'm trying to use the fillDHSCoordinates function to link the DHS geo-coordinates to parasite prevalence rates from the Malaria Atlas project. However the following command:

pf <- getPR(country = c("Nigeria", "Cameroon", "Malawi"), species = "pf")
pf <- fillDHSCoordinates(pf, 
                        email = "myemailaddress@gmail.com",
                        project = "My Project Name")

leads to the following error message:

Writing your configuration to:
   -> ~/Library/Caches/rdhs/rdhs.json

Error in handle_pagination_json(endpoint, query, all_results, timeout) : 
  Records returned equal to 0. Most likely your query terms are too specific or there is a typo that does not trigger a 404 or 500 error

I've checked and ensured that the email address and the project name are consistent with what I had in my DHS account. I also have access to the GPS data I try to link. Could you give me a hint on how to correctly link the DHS to malaria Atlas data here? Many thanks!

timcdlucas commented 3 years ago

Hi,

Thanks for the interest.

I'm also getting the same error and I'm pretty sure I'm putting in my email address and project correctly.

@ojwatson is this something you've seen before? Maybe DHS have recently changed their API?

Otherwise I'll set aside some time to try and work it out. But I'm definitely not mega familiar with this function still.

OJWatson commented 3 years ago

Hmmm. There are a few other changes that have happened with the new DHS website. Am doing some patches at the moment and will have a test to see if this gets fixed as a result. Will get back to you

hanzhang0 commented 3 years ago

Great! Thank you in advance for the help, much appreciated!

OJWatson commented 3 years ago

Hi, @timcdlucas may have to pass this one back to you. The issue in the example posted is that getPR is return NA for all the dhs_id column.

timcdlucas commented 3 years ago

Huh, ok. Thanks @OJWatson I'll have another look. Seems like I could have noticed that myself so sorry for the time waste.

OJWatson commented 3 years ago

No worries at all - I was doing revdep checks anyway so was easy enough for me to check. Let me know though if i'm wrong and/or there is something else wrong further down the line as well re DHS.

timcdlucas commented 3 years ago

Yeah ok something must have changed on the MAP side.

pf <- getPR(continent = 'Africa', species = "pf")

all(is.na(pf$dhs_id))

dim(pf)

Show's that none of the data has dhs_ids. Also the table is 18,000 rows which suggests to me that we're not getting any of the DHS data anymore.

Hi @joemap, do you know what's going on here?

joemap commented 3 years ago

Sorry for the slow reply - so it looks like points with dhs_id get returned when querying for all but not when filtering by continent or country:

> pf <- malariaAtlas::getPR(country = 'all', species = "pf")
Importing PR point data for all locations, please wait...
Data downloaded for all available locations.
NOTE: Downloaded data includes data points from DHS surveys. 
MAP cannot share DHS survey cluster coordinates, but these are available from www.measuredhs.com, via the rdhs package or using malariaAtlas:fillDHSCoordinates().
> all(is.na(pf$dhs_id))
[1] FALSE
> dim(pf)
[1] 48341    28

Since we aren't authorised to republish the DHS locations the export process that transfers the data to our Geoserver strips out lat-longs; it must be (wrongly) stripping out these fields in addition. We'll look into and fix this as soon as we can. In the meantime I figured the all option working might mean you could still work with the dataset: by joining everything to the coordinates and then filtering the points to the region of interest on their lat-longs after joining those in with the fillDHSCoordinates function... but that's not working for me either. I'm not sure if this is perhaps a separate problem entirely as it seems to be coming from rdhs:

> pf <- malariaAtlas::fillDHSCoordinates(pf, 
+                          email = "<removed>",
+                          project = "<removed>")
Writing your configuration to:
   -> ~/.cache/rdhs/rdhs.json

Downloading DHS data.
Error in strsplit(data$FileName, ".", fixed = TRUE) : 
  object 'model_datasets' not found
timcdlucas commented 3 years ago

I was going to check whether using all was a useable workaround but not I can't even run that! I get

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  cannot read from connection
In addition: Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  URL 'https://malariaatlas.org/geoserver/Explorer/ows?service=wfs&version=2.0.0&request=GetFeature&outputFormat=csv&TypeName=PR_Data&PROPERTYNAME=site_id,dhs_id,site_name,latitude,longitude,month_start,year_start,month_end,year_end,lower_age,upper_age,examined,pf_pos,pf_pr,method,rdt_type,pcr_type,rural_urban,country_id,country,continent_id,malaria_metrics_available,location_available,permissions_info,citation1,citation2,citation3': Timeout of 60 seconds was reached
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  cannot read from connection

Maybe this is just a transient thing. I'll check again in a couple of days. But otherwise not sure.

joemap commented 3 years ago

That's odd - it's working for me this morning and directly hitting the URL mentioned in the error also works. I guess it was a transient problem but I'm not sure why it would have occurred, as far as I know there haven't been any changes to the infrastructure and the Geoserver hasn't been restarted for 6 days.

joemap commented 3 years ago

So I've pushed those IDs to the Geoserver. The filtered version pulls in the points now at least...

pf <- getPR(country = c("Nigeria", "Cameroon", "Malawi"), species = "pf")
all(is.na(pf$dhs_id))
dim(pf)
hanzhang0 commented 3 years ago

Hi thanks so much for helping to debug! Now the dataset does have non NA dhs_ids. However I get the following error when I run the fillDHScoordinates command:

Logging into DHS website...
Error in names(filedatatypelist_DHS) <- paste0("filedatatypelist_", qdapRegex::rm_between(filedatatypelist_DHS_line,  : 
  'names' attribute [1] must be the same length as the vector [0]

I've tried to run a fix made by @OJWatson in a different thread (https://github.com/ropensci/rdhs/issues/115) but it seems not working:

devtools::install_github("ropensci/rdhs", ref = "issue33_path")

I've also tried to debug using debug(rdhs:::available_datasets), following the guidance also from @OJWatson . The R session aborted after running this but the error message seems to show:

  if (sum(is.na(fileName_matches)) > 0) {
    message("Some of your available datasets are not found in the DHS API.",
            "This is likely due to the DHS API being out of date and as such ",
            "some of the meta information about your available datasets ",
            "may not be available.")
    fileName_matches <- fileName_matches[-which(is.na(fileName_matches))]
  }

I am a bit clueless on what's happening here but does this have something to do with the non-DHS data in MalariaAtlas Project..? Would these be helpful for you in understanding what's going on @OJWatson ?

OJWatson commented 3 years ago

Hi @hanzhang0 sorry for delay - having hellish week.

That error I think should be fixed with latest rdhs version (which should be on CRAN now), and I couldnt replicate the error. One possible fix will be to clear your rdhs cache before by using get_available_datasets(clear_cache = TRUE). Let me know if that works.

@timcdlucas While testing that this worked for me I found a small bug in malaraAtlas - have made a PR with a fix - https://github.com/malaria-atlas-project/malariaAtlas/pull/56

Hope this helps,

OJ

timcdlucas commented 3 years ago

Thanks @OJWatson! And thanks for always responding to queries about this stuff more generally.

As of today I'm handing maintenance of this package over to @mauricio-tki, so I'll let him deal with these as he sees fit!

mauricio-tki commented 3 years ago

Hi all. Apologies for the long period of silence. I finally got some time to get to know this codebase, and should be able to respond to issues within a reasonable time from now on.

I tried to reproduce the error with rdhs 0.7.3, 0.7.2 and 0.7.1, but haven't managed it.

@hanzhang0 are you still having this issue or was it resolved by clearing the cache or by some other means?

ollawone commented 1 month ago

Hi @mauricio-tki the issue is still there. Just ran into it today. Thanks

joemap commented 1 month ago

Thanks @ollawone for the report. We've had some issues with our spatial data server (GeoServer) this week but everything should be working again. I'm currently able to run the example from the original post; would you mind trying again? Let us know if you are still having issues. Apologies for any inconvenience caused.

ollawone commented 1 month ago

Thanks @ollawone for the report. We've had some issues with our spatial data server (GeoServer) this week but everything should be working again. I'm currently able to run the example from the original post; would you mind trying again? Let us know if you are still having issues. Apologies for any inconvenience caused.

Thanks. It is working