Open 3enedix opened 2 years ago
hi @CharliesWelt , it's a good question and the right channel to discuss this topic.
The CoastSat package uses GEE to filter the image collections, select the bands of interest and crop the images to the region of interest, then download the .tif files. The analysis is then done locally, with python libraries like scikit-image, scikit-learn, shapely, GDAL etc... The advantage of this workflow is that we have full control on the image (pixel by pixel) and can extract the shoreline at sub-pixel resolution, optimise the thresholding algorithm, discard bad images, quality-control the shorelines and many more functionalities.
Others have developed a different approach where everything is done on the GEE server, you can look at the work by Lujendijk et al. 2019 at a global scale using yearly composites (sounds very similar to what you are proposing to do). You can process images directly on the cloud with the GEE API but with more limited functionalities and control on the individual pixels of the image. Also, keep in mind that the GEE code is not open-source, so you can't see the source code to know exactly what each function is doing.
I personally use loads of hard-drives as you mentioned to generate the shoreline time-series over large spatial scales, see for example the CoastSat website. I like to keep a copy of the images in case I need to reprocess the datasets, but you could very well delete the images after extracting the shoreline to minimise memory allocation. From my experience, timewise, the bottleneck is on the image downloads, as the extraction of the shorelines is very fast (as long as you break down the coast on small polygons, ~25-30 sqkm seems to be the optimum).
Good luck with your project, Kilian
This is a nice discussion, and since I have thought about some of these issues, I would also like to chime-in to add the following reasons why a local workflow generally makes sense:
Hi Kilian and Dan,
thanks for explaining your thoughts! I see a lot of good points (especially the reproducibility argument), have to think about others and learn more GEE. So far I was naively assuming that as one can 'store' the image in a variable (with the python API), it should be possible to use functionalities from other toolboxes to manipulate this variable. But apparently that's wrong... will keep learning. And I agree that using another cloud server would not help, as it would still require downloading the images, only then to the server.
Thank you!
timewise, the bottleneck is on the image downloads
Is it possible to download multiple images in parallel? Perhaps with https://tqdm.github.io/docs/contrib.concurrent/?
Edit to answer my own question - yes - as long as you parallelise by site
Hi all,
first of all, thanks for your incredible work, this toolbox is exactly what I need.
Okay, almost exactly. I would like to extract (sandy, muddy and mangrovy) shorelines worldwide, over a period of approximately 30 years. In order to avoid having to buy loads of harddrives and running the code hundreds of times, I was hoping that a shoreline could also be extracted from images stored only temporarily in a variable, and then loop to the next timestep/place.
Is there a special reason why you decided to download the images to a local storage? Do you think it would be possible to extract shorelines without downloading the images?
I am quite new to GEE and would appreciate every hint!
(Hope this is the right place for this question and it is not already answered in the other 213 issues...) Best wishes, Bene