Replace Selenium with JSON request to REST API.

h-a-graham commented 2 years ago

Following @barnabasharris's great suggestion here: ttps://github.com/h-a-graham/EAlidaR/discussions/48

We should implement this new approach. Possibly lots of refactoring required but major benefit is that all available datasets are shown and therefore there is much easier access to, for example, the point cloud data. It's always considerably faster than the selenium approach.

h-a-graham commented 2 years ago

A thought...

The JSON outout created from the API request includes all available datasets and their respective download link.

The EA portal has many great datasets but they spatially vary a fair bit.

I propose that we alter the Approach of this package. If we split the download into two steps - a search, returning an s3 object/data frame providing available data sets, their urls, tilenames, the request's geometry and the download time.

We could create plot methods for this object to display the availability of each dataset across the requested extent.

Also these search objects could be cached so that they could be reused, reducing the number of requests sent to the REST API and make things faster.

Finally the user can review available data, then provide the search object and data set name as arguments for the download function.

barnabasharris commented 2 years ago

Agreed. So in essence there's a 'query' function which returns said JSON/data.frame and then users include this object (along with parameters specifying datasets) in a 'download' function, which then downloads the specified datasets?

Guessing the default behavior of the download function could be to download the latest / highest-res LiDAR for any given location, but using the above two-step approach under the hood?

That said, I do like the idea of a package that could handle any ESRI portal URL i.e. no hard-coded LiDAR specific parameters. I guess defaults could be harmonized with whatever the base URL is...

barnabasharris commented 2 years ago

Also (while I remember) -- we might need to timestamp the s3 query object, as presumably the job ID and download links will expire after a certain amount of time? I've never tested to see how long they last...

h-a-graham commented 2 years ago

Precisely, My thinking is that with this core functionality we can write a bunch of helper/wrapper functions which will return, as you say, most recent dtm, dsm, point cloud etc. with a one liner.

And yes totally agree re your second point! I know that {rstac} does this for the Microsoft Planetary Computer by assining to the global environment with <<- something similar could work - this way not saving anything to disk.

Disadvantage of that approach is that it only lasts for that session. Do you know how long the tokens last for? Maybe it would be better to write to disk if they last multiple days?

barnabasharris commented 2 years ago

Just seen that the results JSON contains a 'job completed' timestamp field. Lovely. I just tried a download URL that was produced on Sept 1st and no joy. My thinking is that the tokens would be relatively short-lived, can't see the argument for keeping them active for long considering they are designed to be used within the space of a single visit to the website...

shouldn't be a problem as we could simply build in a check to see whether the job id is still valid prior to download and if not then to re-run the query to obtain a fresh id.

h-a-graham commented 1 year ago

Hey @barnabasharris, So i've managed to find a little time to play with this. Basically, I've decided to create a new package - this is for a couple of reasons. Firstly because I think the new approach we discussed is so different from everything in {EAlidar} that it just doesn't make sense to keep things compatible and secondly because I beleive that this new approach will allow us to tap into the scottish and welsh lidar data too (although I haven't touched this yet).

See here: https://github.com/h-a-graham/gblidar

For now, things are very early days! I'll need to do some more work on the request geometries - it seems there are quite a few constraints on what the api can handle. Also it would be good to better manage very large requests and chunk them where they exceed the API limit.

I've added the ability to download any data but I'll need to spend a lot more time building the functions to handle the merging. I'd like to (if at all possible) use gdal warp for this and hit the sources directly. Anyway if you have a chance to take a look it would be great to get your thoughts.

Cheers

barnabasharris commented 1 year ago

Hey Hugh, Sounds like a sensible decision -- keep this package for quick n easy access to the EA LIDAR datasets. I'll take a look and add some comments at {gblidar} too.

h-a-graham commented 1 year ago

Thanks @barnabasharris, to be clear, for now I'm working on the lower level functions which will allow us to build the easy access higher level functions far more easily which would equivalent to what we have here. My hope is that by generalising these steps as much as possible we can easily accesss a richer variety of the data - much of which I didn't even know was available like the first and last returns Lidar rasters.

h-a-graham commented 1 year ago

Hey @barnabasharris I know there's been very little action on this over the year but I have been doing a few bits and bobs for now we can download data from gblidar (within the limits of whatever the download limit is).

However, here's a bombshell: https://support.environment.data.gov.uk/hc/en-gb/articles/11168695963293 looks like DEFRA are finally ditching ESRI! Thank goodness but this will mean the death of this package and a rewrite of what exists in {gblidar}

barnabasharris commented 1 year ago

Hmmm, fascinating and great to see a move away from ESRI! It will be interesting to see how this pans out. As it happens, I actually might have some time in October to look at this as I'm in between jobs. Hopefully we might know the lay of the land by then. Sounds as if they might want to accommodate programmatic use of the data anyway so might be easier?

h-a-graham commented 1 year ago

Yes, it certainly feels that way! Also possibly getting sick to the teeth of everyone complaining about how hard it was to access the data?

Awesome! I'll try to find time before October to tidy a few bits up in gblidar... Scottish and Welsh data should also be quite straight forward to include.

If you have some time to look into this that'd be awesome but no stress if it doesn't pan out 🙏

h-a-graham / EAlidaR

Replace Selenium with JSON request to REST API. #49