prioritizr / wdpar

Interface to the World Database on Protected Areas
https://prioritizr.github.io/wdpar
GNU General Public License v3.0
37 stars 5 forks source link

Poor internet connection breaks wdpa_fetch #39

Closed PhDyellow closed 2 years ago

PhDyellow commented 2 years ago

I am using version 1.3.1.3.

When I called aus_mpa <- wdpa_fetch("Australia", wait = TRUE), I got the following error:

Error:   Summary: NoSuchElement
     Detail: An element could not be located on the page using the given search parameters.
     class: org.openqa.selenium.NoSuchElementException
     Further Details: run errorDetails method

I am aware of #35, so I made sure to use a recent github version of wdpar. The issue went away when I switched to a faster, non-mobile, connection.

Reading around, it seems like PhantomJS doesn't have a good way to know if the page has loaded yet, would it make sense to parameterize the sleep timer in wdpa_url instead?

jeffreyhanson commented 2 years ago

Thanks for reporting this and verifying that this is issue is a problem with the latest GitHub version. Yeah I think that would work. Just to check I understand correctly, you're suggesting we add a new parameter (e.g. called sleep_duration, maybe you can think of something better? I'm horrible at naming things...) to wdpa_fetch and wdpa_url so that users can control how long the web driver waits for pages to load? So, in your case, you could set the new parameter to something like 5 seconds so it has more time to load the protected planet web pages over a slow internet connection?

PhDyellow commented 2 years ago

That's right, the sleep_duration you described is what I'm suggesting, and I can't think of a better name either : P

page_load_wait_time?

It might be worth warning the user in the documentation that the delay occurs ~4 times per country code, so a 5 second sleep time means 20 seconds per country.

Overall, I'm impressed with the package, it saves me a lot of time getting MPA polygons into R!

jeffreyhanson commented 2 years ago

That's right, the sleep_duration you described is what I'm suggesting, and I can't think of a better name either : P

Ok great! Yeah, I feel like page_load_wait_time is more descriptive, but a bit long. What about just wait_time?

page_load_wait_time?

Yeah, I feel like page_load_wait_time is more descriptive, but a bit long. What about just wait_time?

It might be worth warning the user in the documentation that the delay occurs ~4 times per country code, so a 5 second sleep time means 20 seconds per country.

Good point - I'll add that to the documentation

Overall, I'm impressed with the package, it saves me a lot of time getting MPA polygons into R!

Thanks! How urgently do you need this? I've got other stuff I need to focus on this week - but I can try and get this into the GitHub version over the weekend? If you need it urgently ASAP, I suggest forking the repo and manually changing the sleep durations.

jeffreyhanson commented 2 years ago

PS. there's currently a bug in importing the full global dataset. So if you need that, try the fix-global branch.

PhDyellow commented 2 years ago

How urgently do you need this?

I think I can work around the bug manually by copying the pre-downloaded data to the right folder, so I can wait until you get the fix into the repo. I won't need the global data for now, but if any of my colleagues ask about it, I'll know where to point them.

What about just wait_time?

wait_time sounds good, as long as it doesn't get confused with the wait parameter that's already in there.

EDIT: fix comment formatting

jeffreyhanson commented 2 years ago

Ok,great - thanks!

Ah - that's an excellent point. Yeah, I totally forgot about about the wait parameter. How about page_wait_time?

PhDyellow commented 2 years ago

I suggest page_wait: not too long, and starting with page gives some contrast to the wait parameter.

In the end though, I think any of the ideas so far are fine. Thanks for putting together the fix quickly!

jeffreyhanson commented 2 years ago

Ok - sounds good! I'll try and get that merged into the main branch tonight this weekend (I've added it to PR https://github.com/prioritizr/wdpar/pull/38 which I'm working on to get the cleaning process working for the global dataset)

jeffreyhanson commented 2 years ago

I've just merged the PR - so if you install from the main branch, there's the new page_wait parameter to control sleep times. Let me know if you run into any issues with it?