yotarazona / scikit-eo

A Python package for Remote Sensing Data Analysis
https://yotarazona.github.io/scikit-eo/
Other
15 stars 2 forks source link

Examples #19

Closed KBodolai closed 19 hours ago

KBodolai commented 2 weeks ago

ping openjournals/joss-reviews#6692

First of all, I want to congratulate you on the structure and how the notebooks walk the user through using ML with remote sensing data, I think there's some pretty powerful abstractions here. I'll be pointing out some minor issues or things to improve here and in some other issues for the joss review, but I just wanted to clarify that I think this is great work!

There's a few things in the notebooks that I think can be improved:

  1. The 'running notebooks' section (cell 8) seems to be common for all notebooks. Since currently the notebooks have a default that will work wherever you run it, I think it would be better for this to form part of a readme in the notebooks folder, and perhaps use that space to add some more information / context in the notebooks.
  2. More verbosity in examples - I think that a few of the examples could do with a bit more text explaining what's going on, where the data comes from, contextualising the results, key scientific work supporting the methods...
  3. Adding information about running things at scale - a big challenge of using ML with remote sensing is the size of our data once we try to run things for e.g. entire countries. How can we scale things up? Can we add an example showing how to use modern software such as dask + xarray to scale?
  4. Where does the endmember matrix come from in Example 06?
  5. IN the data fusion examples, the images are already well aligned and processed (I'm guessing the S1 data has been downscaled to 30m and the grids have been aligned). This is a strong assumption for working with satellite data and I think it should be explained (which is in the documentation of the fusionrs function, kudos!), but I think it wouldn't hurt to add it in the notebook as well (particularly thinking about the use of the software for teaching!)
yotarazona commented 5 days ago

Thank you very much @KBodolai for this feedback. We appreciate your comments. I hope I can answer all your questions in the best way. However, there are some points that will escape this answer.

  1. I agree, we have added a .Readme inside the examples to explain the way to download and read the data used in each notebook. We have only left the automatic reading of the dataset on each notebook. You can see these changes here: https://github.com/yotarazona/scikit-eo/tree/main/examples.
  2. Agreed, we have added more text contextualizing the examples and improving the discussion of some of the notebooks results. While not all notebooks have this contextualization, we have prioritized improving the most important notebooks or the most relevant functions that users will use such as the automatic classification and deep learning algorithms. These changes can be found at this link: https://github.com/yotarazona/scikit-eo/tree/main/examples/notebooks.
  3. This point you mention is very important. It is logical that the scalability and/or the ability to run the algorithms for large regions will depend on the computational capabilities of each computer being used. No matter what library or package or parallel programming we use, it will not increase the computational capabilities with any package or module installed on the computer. We can increase using two cores, three or four, but the computational limitations will not change strongly if the objective is to map large areas of the terrain, for example. I think the best scalability realistically speaking is to use these tools with better computational capabilities. For example, users can purchase Google Colab Pro or Colab Prog+ services. With the latter package the user can have more TPUs and GPUs available. For example, for Deep Learning processing and architectures such as U-net that require large computational capabilities, Colab Pro+ provides 83.5 GB of RAM, 40.0 GB of GPU and 201.2 GB of disk. With these capacities it is possible to map ground decks for all of Portugal, for example, I mean using U-net architectures. For using machine learning algorithms it could further increase the scale of work because Colab Pro+ provides another type of GPU/TPUs available, for example it provides up to 334.6 GB of RAM, which can process the entire UK, Germany, Spain, among other countries. Therefore, it is possible to use the functions/classes of the scikit-eo package in a Colab environment and take advantage of all available resources. Personally, I use these services to scale my research using Deep Learning and taking advantage of these high computational capabilities of Google.
  4. Thank you for this. These endmembers were obtained through the visual interpretation of the covers of the Landsat image itself. The endmembers are just pure spectral signatures of the canopies, they can be obtained either by field work with a spectroradiometer or through interpretation of the satellite image itself based on user experience of course.
  5. Thank you for this. Indeed as you mention, it is important that both images obtained with different geometries should be coregistered considering one of them as master and the other as slave to avoid comparing potatoes with sweet potatoes. This can be done using SNAP software for example. I suggest reviewing the paper published by (Tarazona et al., 2021).

I hope I have satisfied all your concerns and thank you very much for all your suggestions and comments on the package :).