raspstephan / sugar-flower-fish-or-gravel

Repository for the Zooniverse cloud classification project.
https://www.zooniverse.org/projects/raspstephan/sugar-flower-fish-or-gravel
4 stars 4 forks source link

Infrared datasets #3

Open raspstephan opened 5 years ago

raspstephan commented 5 years ago

@observingClouds You mentioned that we could also use a different infrared dataset to train the ML model on. Which dataset were you thinking of? How long are these data available for? Several decades? Could we do a climate change study with that?

observingClouds commented 5 years ago

I see a benefit of using infrared data in the following cases:

  1. Identification of categorically wrong labels The infrared images can give us an information about the CTH. Since we are interested in shallow convection only and all our patterns shouldn't reach higher than say 7 km, we could exclude labels that contain substantial amount of high clouds. To be on the save side I would actually exclude all images with high clouds. These IR images come with no cost for us, as they are captured by MODIS as well and can be downloaded easily from worldview with the download script. ( I actually did some work on this already...I'll push the branch later) If downloaded with the same resolution, the labels can be just used in the same way.

  2. Expanding our scope Humans are used to identify clouds as white patches in the sky and so labelling cloud patterns on satellite images is much easier for them in the visible channel. However, I would argue that for our case to detect shallow clouds (and exclude high ones), the infrared channel is the better option as long as the resolution is sufficient. By using the brightness temperature to train the model my hope would be to use the trained model directly on a different dataset to expand our scope. Currently I have two in mind:

    • GOES16 IR (covering the Atlantic since mid 2017, 15 min): analysing the diurnal cycle
    • GridsatB1 (tropics/subtropics, 1981-present, 3 hourly): this might give some opportunity to analyse inter annual changes/trends. This dataset however has a resolution of just 7km. However, rather than looking for a trend, I would take the SST temperatures of these years and compare them with the probability of the occurrence of patterns and also look globally for these SST-pattern relations. That we could do and it is in my mind more promising than trying to retrieve a statistical significant trend from 30 years of data. ( however nothing for end of March I guess ;) )

But also on the MODIS dataset it would have the advantage that we could do classifications at night time where the visible channel is black, which could strengthen an argument about the annual cycle

  1. Transferability A property like brightness temperature exists or could be calculated rather easy in model simulations. Reflectivity or even a visible representation of the model from space are not feasible.

Note on the resolution The resolution of the visible data of MODIS is up to 250m, while the infrared channels has just 1km. However, since we downloaded the images with roughly 100px/1deg we had only a resolution of about 1km and that was sufficient to distinguish between these patterns. In addition, if we think about the smaller features like sugar, one could also imagine, that the bigger surrounding clusters characterise it rather than the small clouds itself. Short: I'm optimistic that IR data should work as well

observingClouds commented 5 years ago

I just realised over the weekend, that the MODIS dataset goes also back to 2000(Terra)/2002(Aqua). So there is also some room for expansion in VIS and IR images.