openclimatefix / Satip

Satip contains the code necessary for retrieving, transforming and storing EUMETSAT data
https://satip.readthedocs.io/
MIT License
41 stars 28 forks source link

Add Cloud Masking/Detection Algorithm #227

Open jacobbieker opened 5 months ago

jacobbieker commented 5 months ago

There is a paper here: https://www.meteoswiss.admin.ch/services-and-publications/publications/scientific-publications/2013/the-heliomont-surface-solar-radiation-processing.html that describes how surface solar radiation is determined for MeteoSwiss, but also includes detecting different types of clouds and creating a cloud mask in SEVIRI imagery.

Detailed Description

Context

This can be quite useful for making our own cloud masks from the raw imagery, or cloud types. The paper also includes an interesting way of correcting for orbital maneuvers of the satellite, to realign the imagery, which might be very helpful.

Possible Implementation

The paper is quite detailed, so possibly just going directly off of that into Satip.

vikasgrewal16 commented 5 months ago

Can you assign me this issue and give me a brief on what and how to do so that i can work on this issue? Regards

jacobbieker commented 5 months ago

Hi, the details are in the paper linked to in that website, they have their approach to cloud masking that should work here. For adding it to Satip, you could add a cloud_mask file that has the cloud masking algorithm implementation, and add some tests that run on the public Zarrs to see how well it works? .

vikasgrewal16 commented 4 months ago

i have read those details but when i was bbuilding the project and downloading the data with eumetsat api i have come up with this erroe can you please provide me some information or solution regarding this error.

File "/home/grewal/Satip/venv/lib/python3.10/site-packages/botocore/auth.py", line 418, in add_auth raise NoCredentialsError() botocore.exceptions.NoCredentialsError: Unable to locate credentials

vikasgrewal16 commented 4 months ago

Can i get your inputs on this issue?

jacobbieker commented 4 months ago

Hi, sorry for the delay, it seems that you need to log in to AWS for those credentials. That seems like you are using the app.py, which currently does upload to S3 by default. For this, you should be able to use the public google cloud dataset here to get the raw data to use with the cloud masking algorithm.

jacobbieker commented 4 months ago

Also, I would recommend focusing on a single issue @vikasgrewal16 if possible? There are quite a few different potential GSoC contributors who are wanting different good first issues. I've seen you also commented on #231, are you more interested in this one or that one? Or a different one?

vikasgrewal16 commented 4 months ago

Thank you @jacobbieker for reaching out and bringing up the importance to focus on a single issue. I appreciate your guidance in streamlining the efforts.

Regarding your question on my preferences, I have a keen interest in both GIS and ML, which is why I am actively contributing to this project. My involvement aims to not only learn about open source but also to become a valuable part of the community. GSoC is a means to this end, and I see it as an excellent opportunity to contribute substantively.

As for the specific issues, for now i will be focusing most on #231

Looking forward to your advice and direction.

Best regards, @vikasgrewal16

Surya-29 commented 4 months ago

Hi @jacobbieker !

I've read the details of the SPARC cloud masking algorithm as mentioned in the reference provided by you. Currently, I'm looking through the properties of the raw data from the shared data bucket and would like to work on the implementation part of the algorithm. I would appreciate it if you could assign this issue to me. Thank you!

jacobbieker commented 4 months ago

Hi @Surya-29, that sounds great!

Surya-29 commented 4 months ago

I'm having some trouble finding attributes necessary for calculating the SPARC score (used for cloud mask). The problem is that these attributes, specifically clear sky/cloud free brightness temperature $T{cf}$ and background reflectance $\rho{cf}$ ​, aren't available in the SEVIRI dataset provided. They can either be calculated (Section 6 Clear Sky Compositing 1) or retrieved from other datasets (All Sky Radiances 2) provided by EUMETSAT. Can I go with the latter option since calculating these attributes might involve fitting a model over the diurnal course? However, the issue with accessing the ASR dataset is that it is only available on the EUMETSAT Data Center (which requires us to order it) and not on the Data Store, so downloading via API is not possible right? @jacobbieker How should I approach this now?

jacobbieker commented 4 months ago

Ah okay, I would have thought that info would have been in the attributes of the native files. Yeah, for a first pass on getting this in, I think getting some data from the data center, and using that is probably the right way to go for now. We can always try to then add calculating the values ourselves later, as the data center can be quite slow to give data. You are right there is no api access to the data center unfortunately. Another, less ideal option, would be to see if we can find an average value, either for the year or per month, that we could use instead? But not sure if there is that published or not somewhere.

Surya-29 commented 4 months ago

Yes, I'll probably go with averaging for background reflectance $\rho{cf}$. As for brightness temperature $T{cf}$, I would prefer to implement the model mentioned in the paper, if possible, since the final $sparc{score}$​ requires at least $T{score}$ to be calculated. Although this aggregate score cloud masking algorithm could compensate for other missing attributes in $sparc_{score}$​ calculation.

Surya-29 commented 4 months ago

I've made progress on implementing the cloud masking algorithm and have committed the changes to my remote repository ( changes ), should I raise a PR even though the functionality of the code is partial?

jacobbieker commented 4 months ago

Awesome! I would open a PR as a draft PR even if it's incomplete, and just keep adding to it that way.

Yeah, a subpackage would be really good to have.

Yes, the output should be in an Xarray data format, primarily to keep the coordinates and satellite attribute information, you could probably essentially just swap out the data values in the xarray satellite image with the cloud mask data and it would be good to go.

If it is easier, focusing on the European area of interest is fine for now, but we would want to extend it to work over Africa and with the Indian Ocean imagery as well.