Open jacobbieker opened 1 year ago
MODIS albedo, (and other NASA data) data can be accessed freely from: https://lpdaac.usgs.gov/tools/data-pool/
MERRA2 data is here, for aerosols: https://disc.gsfc.nasa.gov/datasets?page=1&subject=Aerosols&project=MERRA-2
And ECMWF CAMS: https://atmosphere.copernicus.eu/data
The CAMS data has reanalysis data, and the daily forecasts, available from 2015 to now, with yearly model updates, and new forecasts every 12 hours. There is a Europe grid for 0.1x0.1 instead of the 0.4x0.4 of the global one, daily update, and 3 year rolling archive.
The CAMS forecast, that is only available on 3 year rolling window, is now being mirrored to here: https://huggingface.co/datasets/openclimatefix/ecmwf-cams-forecast only on levels 0, 500, 1000, 2000,3000,and 50000 meters, so ignoring 50, and 250, as they seem to error out more. This means we should have the data from March 2020 to now available whenever we want to use it. The validated reanalysis is available back to 2018, and the global one doesn't seem to be rolling, so we can get those later.
@AUdaltsova This might be a good issue for an overall tracking of the project of adding MERRA2 to Satip
Source: https://disc.gsfc.nasa.gov/datasets/M2I3NXGAS_5.12.4/summary Time range: 2018/01/01 - 2024/02/29 Spatial resolution: grid of 0.5 latitude x 0.625 longitude Time resolution: 3h Variables: AODANA (Aerosol Optic Depth Analysis), AODINC (Aerosol Optic Depth Increment Analysis). AODANA is the main variable providing measurements, AODINC is the difference between the forecast and reanalysis values. AODINC will not be available as a feature in production.
Completeness: no NaN values detected, as expected for reanalysis data Range: min: 0, max: - (See Range below)
Seasonal map of AOD. Main point of interest: high and dynamic readings in India across seasons, which is encouraging.
Monthly progression to observe some trends, there are clear regional patterns each with its own seasonality.
A NASA article had a description of the trends that are present in these plots as well, which might be of some interest in terms of context:
“High aerosol amounts are linked to different processes in different places and times of year. High aerosol amounts occur over South America from July through September. This pattern is due to land clearing and agricultural fires that are widespread across the Amazon Basin and Cerrado regions during the dry season. Aerosols have a similar seasonal pattern in Central America (March-May), central and southern Africa (June-September), and Southeast Asia (January-April).”
Monthly trends. Increase in aerosols over the summer. Note the difference in std for India between the high season and winter; UK is noticeably flatter trend-wise. Both this and low overall values for UK potentially mean the data might not be as impactful in this area, but there might be stronger local trends, e g in the south, that are more expressive than the regional trend.
Note: the November/December spike in India is likely due to the burning of farming fields, it was a known source of smog in Delhi in 2023
India: visually coincides with intuition about the region (mountains, desert area), some noticeable variation in general, so hopefully going to be useful.
Map view of UK: more variance in the south, high season April-August, barely any changes in the north. NB that this is on the smaller scale up to .11 where India reaches .8
I could not locate any documentation on how the data is obtained save for a formula here that seems to be relevant, so no ‘hard evidence’ is available on the supposed range. There are two things of note:
There are negative values present in the data but from the formula it seems that AOD values cannot be negative by nature. The points are sporadic and appear to not correspond to any patterns either locationally or temporally, but the fraction of the data is negligible (magnitude of e-7), so not a real problem, should just be imputed.
This is a somewhat hazy area, since there seems to be no formulaic upper limit to AOD, and the data has points of up to 70, which is enormous compared to the regular range (normally everything fits between 0.2 to 1, with 1 already indicating low visibility). However, this is not a reanalysis artefact: the extreme points seem to all correspond to massive wildfire reports.
Worth noting that the fraction of the extreme data is marginal as well, but should be preserved as it is a depiction of extreme conditions which would be highly relevant to the forecast.
@jacobbieker Apologies for the delay - yes, this sounds like a valuable improvement!
I'm assuming the NASA satellite would require the same kind of processing chain as is already present in Satip (satpy etc) so definitely makes sense for that to be in here. The ECMWF CAMS though I'd imagine to be different, as I thought it isn't satellite but rather back to gridded NWP, so maybe would make more sense in the nwp-consumer - I guess it just depends on what can be most easily adapted to include the CAMS data?
This data is satellite derived, but is a reanalysis, so it's kinda in between, as it doesn't need the same processing as eumetsat, but is also no 5D like NWPs. We are going to be using it as a pseudo nwp for pretraining, CAMS is an actual NWP so that should be in the NWP consumer I think.
Based off some of Solcast's learnings, it seems aerosols and albedo can be quite helpful for forecasating. Adding support for those might be quite helpful.
Detailed Description
For Aerosols:
For Albedo:
Context
Could be good to include for more training, and improve our forecasting. As most of this is from satellite data, Satip seems like a good place to put it, although Satip has been primarily focused on EUMETSAT data.
Possible Implementation
Add to the DownloadManager option to get these aerosol measurements, including adding an option to switch from EUMETSAT access to NASA, or ECMWF CAMS? Or split it into its own repo and functionality, as it is a bit separate, although if we want to have a unified 'satellite' interface, it might be good to keep in here.
Thoughts possibly @devsjc ?