openclimatefix / Satip

Satip contains the code necessary for retrieving, transforming and storing EUMETSAT data
https://satip.readthedocs.io/
MIT License
41 stars 29 forks source link

Add support for aerosols, albedo #153

Open jacobbieker opened 1 year ago

jacobbieker commented 1 year ago

Based off some of Solcast's learnings, it seems aerosols and albedo can be quite helpful for forecasating. Adding support for those might be quite helpful.

Detailed Description

For Aerosols:

best global models for our aerosol inputs (being NASA MERRA2 for historical and ECMWF CAMS for real-time and forecast), however we have made major improvements to how we use these global models to represent local aerosol conditions. We’ve validated our improvements at hundreds of AERONET sites globally.

For Albedo:

Switching to finer resolution detail from NASA MODIS satellite imagery products has helped us significantly improve albedo calculations. The higher resolution images captured by the satellite allow for a more accurate estimation of albedo, compared to the NASA MERRA2 albedo data previously used..

Context

Could be good to include for more training, and improve our forecasting. As most of this is from satellite data, Satip seems like a good place to put it, although Satip has been primarily focused on EUMETSAT data.

Possible Implementation

Add to the DownloadManager option to get these aerosol measurements, including adding an option to switch from EUMETSAT access to NASA, or ECMWF CAMS? Or split it into its own repo and functionality, as it is a bit separate, although if we want to have a unified 'satellite' interface, it might be good to keep in here.

Thoughts possibly @devsjc ?

jacobbieker commented 1 year ago

MODIS albedo, (and other NASA data) data can be accessed freely from: https://lpdaac.usgs.gov/tools/data-pool/

jacobbieker commented 1 year ago

MERRA2 data is here, for aerosols: https://disc.gsfc.nasa.gov/datasets?page=1&subject=Aerosols&project=MERRA-2

jacobbieker commented 1 year ago

And ECMWF CAMS: https://atmosphere.copernicus.eu/data

jacobbieker commented 1 year ago

The CAMS data has reanalysis data, and the daily forecasts, available from 2015 to now, with yearly model updates, and new forecasts every 12 hours. There is a Europe grid for 0.1x0.1 instead of the 0.4x0.4 of the global one, daily update, and 3 year rolling archive.

jacobbieker commented 1 year ago

The CAMS forecast, that is only available on 3 year rolling window, is now being mirrored to here: https://huggingface.co/datasets/openclimatefix/ecmwf-cams-forecast only on levels 0, 500, 1000, 2000,3000,and 50000 meters, so ignoring 50, and 250, as they seem to error out more. This means we should have the data from March 2020 to now available whenever we want to use it. The validated reanalysis is available back to 2018, and the global one doesn't seem to be rolling, so we can get those later.

jacobbieker commented 7 months ago

@AUdaltsova This might be a good issue for an overall tracking of the project of adding MERRA2 to Satip

AUdaltsova commented 6 months ago

MERRA-2 Aerosol Optical Depth: Analytics

Data Sheet

Source: https://disc.gsfc.nasa.gov/datasets/M2I3NXGAS_5.12.4/summary Time range: 2018/01/01 - 2024/02/29 Spatial resolution: grid of 0.5 latitude x 0.625 longitude Time resolution: 3h Variables: AODANA (Aerosol Optic Depth Analysis), AODINC (Aerosol Optic Depth Increment Analysis). AODANA is the main variable providing measurements, AODINC is the difference between the forecast and reanalysis values. AODINC will not be available as a feature in production.

Completeness: no NaN values detected, as expected for reanalysis data Range: min: 0, max: - (See Range below)

World map views

seasonal_mean_bordered

Seasonal map of AOD. Main point of interest: high and dynamic readings in India across seasons, which is encouraging.

monthly_means_bordered

Monthly progression to observe some trends, there are clear regional patterns each with its own seasonality.

A NASA article had a description of the trends that are present in these plots as well, which might be of some interest in terms of context:

“High aerosol amounts are linked to different processes in different places and times of year. High aerosol amounts occur over South America from July through September. This pattern is due to land clearing and agricultural fires that are widespread across the Amazon Basin and Cerrado regions during the dry season. Aerosols have a similar seasonal pattern in Central America (March-May), central and southern Africa (June-September), and Southeast Asia (January-April).”

Trends and comparisons: India and UK

Scale and trends

crop_stats_github crop_stats_distrib_github

Monthly trends. Increase in aerosols over the summer. Note the difference in std for India between the high season and winter; UK is noticeably flatter trend-wise. Both this and low overall values for UK potentially mean the data might not be as impactful in this area, but there might be stronger local trends, e g in the south, that are more expressive than the regional trend.

Note: the November/December spike in India is likely due to the burning of farming fields, it was a known source of smog in Delhi in 2023

Map views

india_crop

India: visually coincides with intuition about the region (mountains, desert area), some noticeable variation in general, so hopefully going to be useful.

britain_crop

Map view of UK: more variance in the south, high season April-August, barely any changes in the north. NB that this is on the smaller scale up to .11 where India reaches .8

Range

I could not locate any documentation on how the data is obtained save for a formula here that seems to be relevant, so no ‘hard evidence’ is available on the supposed range. There are two things of note:

Negative values

There are negative values present in the data but from the formula it seems that AOD values cannot be negative by nature. The points are sporadic and appear to not correspond to any patterns either locationally or temporally, but the fraction of the data is negligible (magnitude of e-7), so not a real problem, should just be imputed.

Upper limit

This is a somewhat hazy area, since there seems to be no formulaic upper limit to AOD, and the data has points of up to 70, which is enormous compared to the regular range (normally everything fits between 0.2 to 1, with 1 already indicating low visibility). However, this is not a reanalysis artefact: the extreme points seem to all correspond to massive wildfire reports.

Worth noting that the fraction of the extreme data is marginal as well, but should be preserved as it is a depiction of extreme conditions which would be highly relevant to the forecast.

devsjc commented 6 months ago

@jacobbieker Apologies for the delay - yes, this sounds like a valuable improvement!

I'm assuming the NASA satellite would require the same kind of processing chain as is already present in Satip (satpy etc) so definitely makes sense for that to be in here. The ECMWF CAMS though I'd imagine to be different, as I thought it isn't satellite but rather back to gridded NWP, so maybe would make more sense in the nwp-consumer - I guess it just depends on what can be most easily adapted to include the CAMS data?

jacobbieker commented 6 months ago

This data is satellite derived, but is a reanalysis, so it's kinda in between, as it doesn't need the same processing as eumetsat, but is also no 5D like NWPs. We are going to be using it as a pseudo nwp for pretraining, CAMS is an actual NWP so that should be in the NWP consumer I think.