polar-computing / AerosolDelta

Quantifying aerosol presence and composition over Earth's ice sheets and glaciers - mapping anthropogenic and natural aerosol patters and estimating changes over time
3 stars 1 forks source link

Kickoff discussion and background information #3

Open r4space opened 8 years ago

r4space commented 8 years ago

Some background and clarifications: (1) SPATIAL OR SURFACE AREA OF INTEREST:
The Greenland and Antarctic ice sheets are of interest, along with Earth's glaciers. Earth's glacier outlines can be obtained from the Randolph Glacier Inventory 5.0.

(2) AEROSOL data will be provided by MERRA-2 / user guide / ordering data or downloading MERRA-2 data from the ftp site - ftp://goldsmr4.sci.gsfc.nasa.gov/data/s4pa/MERRA2_MONTHLY/ .

Let's start with M2TMNXAER.5.12.4 files, inside these files, column mass density, scattering AOT, and surface mass concentration are provided by species (dust, black carbon, organic carbon, sea salt, sulphate aerosols) are found in scientific data bands (SDS's) named on the intro page.

To answer a question from Karanjeet -- satellite aerosol observations over ice unfortunately are not obtained with great accuracy yet. MODIS and other satellite observation aerosol algorithms can be obtained over the ocean, dark land surfaces, however algorithms routinely fail over ice and snow. There are not yet high-quality/accurate satellite observations that directly give aerosol type and magnitude over snow and ice. MERRA-2 will be the main way we quantify aerosols over ice & snow surfaces. It uses MODIS, MISR, and Aeronet aerosol observations over dark land surfaces and ocean - where aerosols can be mapped with higher accuracy. MERRA-2 then models aerosols over snow & ice surfaces (based on nearby dark land/ocean aerosol type, magnitude). MERRA-2 output data has a resolution of 0.625 degrees (longitude) (576 grid points) by 0.5 degrees (latitude) (361 grid points) or approximately 50km spatial resolution. (Data is input via the assimilation system on a cubed-sphere grid of approximately half-degree resolution. The aerosol assimilation is performed at eight synoptic times per day (0,3,6,9,12,15,18,21Z -- per MERRA-2 aersol assimilation document.)
If greater time was allowed for this project, or if it evolves, the raw Level 1 MODIS data could be a dataset to explore. Application of our own atmospheric corrections, filters for clouds, sun angle, etc is possible. Unfortunately, it is a bit beyond the scope of this first project.

(3) SURFACE REFLECTANCE data will be used from MODIS / user guide / ordering data / some MODIS tools, including DAAC2DISK tool / how to grid MODIS tile data. I also recently found pyMODIS -- it may be of help in mosaicing/reprojecting/gridding/downloading MODIS data. MODIS Aqua 8-day data is available from 2002 to near present. We'll use the most recent data, Collection 6 for land ice reflectance values. You can tell the difference between Collection 5 vs 6 data by the '.005.' vs '.006.' notation in the file names and at the end of the directory folders. Also, for downloading MODIS data, there is a Daac2Disk tool that is sometimes helpful. (Note, there is also the MODIS Terra sensor, which provides data from 2000-present. Over time, the MODIS Terra sensor has experienced degradation in some of its multispectral bands. For more information, please see this link. If needed, we can collect MODIS Terra 8-day data from 2000-2002. After 2005ish, we should strictly be using MODIS Aqua data.)

We can also collect surface reflectance data from MERRA-2. This is found in a SDS of the MERRA-2 M2TMNXRAD.5.12.4 file available for download at the MERRA-2 ftp site ftp://goldsmr4.sci.gsfc.nasa.gov/data/s4pa/MERRA2_MONTHLY/

Background information: Question by Karanjeet: "One statement that I am not able to understand from MODIS product description - "It corrects for the effects of atmospheric gases and aerosols". Does this mean, the surface reflectance data will not show the effect of aerosol (which we actually need)?" The MODIS Aqua MYD09A1 surface reflectance product provides information on land surface properties/composition. Atmospheric corrections are applied to convert the top-of-atmosphere satellite measured data to estimated surface reflectance values. So we will not be investigating aerosols with MYD09A1, but perhaps we will gain a bit of information i.e.if it is easy to do. We should also pull the 'aerosol quantity' flag SDS. One approach is to use the surface reflectance data to inspect surface deposition of dust, black carbon, and other light-absorbing impurities that have been deposited on the ice (shown in surface reflectance signature) and to estimate ice albedo.

Question by Karanjeet: "I was reading about MODIS PR09A - L3. It says we have many observations for each pixel recorded from the same orbit. And later we select the one which has the highest coverage. Does this mean many observations in a single day or 8-day?" The MYD09A1 product is created from many observations over an 8-day period. The algorithm selects the best observation based on view angle, clouds, aerosol, etc. Again, if we were to use daily data, over many cryospheric regions clouds, cloud shadows, thin clouds will be a limiting factor. We would also need to program a filter for satellite view angle & acquisition time (best acquisitions are near nadir view, daytime, near solar noon). Using the 8-day product helps eliminate the need for these filters. The MYD09A1 8-day product will possibly give us a reliable first look at surface reflectance monthly seasonality (2002-near present) in comparison to the MERRA-2 monthly aerosol composition and magnitude.

Question by Karanjeet: "From the technical perspective, I am reading about HDF5 file formats and how to operate on them from [hdfgroup.org]. This will help us read both MODIS and MERRA-2 files as netCDF is also based on HDF5 format." MYD09A1 is in the earlier version of HDF5, HDF. MERRA-2 is provided in netCDF The hdfgroup website is a great source of information.

Question by Karanjeet: What is spatial resolution? And how will we compare the data of different spatial resolutions? Simply, spatial resolution refers to the amount of land/ocean/surface being measured, and in what detail or scope. Here is a remote sensing tutorial page that may be helpful: 1
or 2 Yes, there are spatial resolution differences in comparing the MODIS data vs. MERRA-2 data. MODIS has finer spatial resolution than MERRA-2. The spatial resolution for MERRA-2 is 0.5 degree by 0.625 degree or approximately 50 km spatial resolution vs. 500m for MODIS MYD09A tiles. (See here if interested in calculation) Typically, I grid the data in my desired manner/bounding boxes etc, and then I compare by regridding different spatial resolution data sets on the common desired grid. Let me know if you need more resources or further explanation.

kacasey commented 8 years ago

Hi team - I edited Jane's above comment - adding background info and most of our email questions/answers to date. thanks! -kimberly

aalavandhan commented 8 years ago

Forgive me but I’m very new to the earth sciences domain and have some trouble looking at this bottom up. Let me suggest an alternative framework to drive these discussions forward.

Goal: What are we looking to do with Aerosol Delta? ( What are the questions we are looking to answer with Aerosol Delta). As I gather we look to:

The typical workflow for any data analytics problem is the following. This workflow will apply to our problem as well. Let’s look at Aerosol delta in the context of these 5 phases and let’s discuss each from a data analytics / earth sciences standpoint.

image

Phase 1: Data Source Identification

We are looking at 3 unique data sources, namely

Questions:

  1. Can we identify specific download links for each data source?
  2. Can we better understanding of each data set, How was this data generated? What does it describe?
  3. Identify the data-format of each data set and specifically describe tools to process each data set?

Phase 2: Data Cleansing

Questions:

  1. What are the type of errors which can be seen with each of these data set? ( This requires understanding of how this data is generated )
  2. How do we identify potentially bad records / fields? ( ones which we can eliminate )
  3. Will we be looking to correct bad records in the data or just discard all of them?
  4. If we are looking to correct certain types of errors how do we go about doing this?

Phase 3: Data Integration We are looking at multi formatted data (csvs, shp files, hdf5 etc). We need to write scripts extract specific data attributes and dump it into a single place.

Questions:

  1. What is the technology we are looking to use for this? (RDBMS, NOSQL?)
  2. What are data-attributes in each data set relevant to this problem.
  3. How big is this aggregated data going to be? The size will determine the technology we need to use and the strategy we will be looking to adopt.

Phase 4: Insight generation Once we have the required data in one place, we need to write data aggregation scripts for specific insights.

Questions:

  1. What are the specific questions we are looking to answer with Aerosol delta? ( Very important - Lets try to list all of them )
  2. What technology do we use to write these aggregation scripts?
  3. Where do we store these intermediate aggregations? ( Elastic search? )

Phase 5: Data visualization Based on the prebuilt aggregations we need to build visualizations which answer the specific questions we’ve identified.

Questions:

  1. What are the type of visualizations we are looking to build? (Charts, Graphs?)
  2. How do we effectively show temporal change of Aerosol levels over a map? ( We should look to whiteboard these visualizations )

These are some of the top-level questions which came to my mind. I think we will have more questions.

Let’s look to open a new issue for each of these ‘Phases’ and discuss each one in detail based on this framework.

As far as I see it we should be done with the first 3 phases and at-least half of phase 4 before we reach Miami. If the datasets are huge these scripts take a lot of time to run. We shouldn’t be at a state where we are in the hackathon and half our time goes by in waiting for these scripts to finish.

What say guys?

kacasey commented 8 years ago

Hi All,

Nithin, thank you for your email. I agree that it would be helpful to get thru several initial steps prior to Miami as the data sets are large and will require time to process.

We have parts of your suggested "Goal" "Phase 1", "Phase 2" and "Phase 3" discussed in email and as of last week updated on github on readme and issue pages. "Goal" and "Phase 1" for the most parts are addressed on the home page readme file and last week's kickoff issue page. I will continue refining the Readme to add all "Goal" and "Phase 1" known information in an organized fashion today and tomorrow. Feel free to add further "Goal"/"Phase 1" discussion to the kickoff issue page or a new issue page. Karanjeet has also added the netCDF reader for MERRA-2 issue page. Is it helpful to add issue pages for each of the data format ingestion? - - e.g. HDF reader for MODIS & CALIOP; GeoTIFF reader for Landsat, SHP shapefile reader for glacier outlines. If it helps I can add these issue pages and include any known/available readers/tools for each data type.

I appreciate any advice/input as to how to best organize on github - this format is new to me.

with thanks,


From: Nithin Krishna [notifications@github.com] Sent: Tuesday, June 28, 2016 12:19 AM To: polar-computing/AerosolDelta Cc: Casey, Kimberly Ann. (GSFC-615.0)[UNIV OF MARYLAND]; Assign Subject: Re: [polar-computing/AerosolDelta] Kickoff discussion and background information (#3)

Forgive me but I’m very new to the earth sciences domain and have some trouble looking at this bottom up. Let me suggest an alternative framework to drive these discussions forward.

Goal: What are we looking to do with Aerosol Delta? ( What are the questions we are looking to answer with Aerosol Delta). As I gather we are looking to:

The typical workflow for any data analytics problem is the following. This workflow will apply to our problem as well. Let’s look at Aerosol delta in the context of these 5 phases and let’s discuss each for a data analytics / earth sciences standpoint.

[image]https://cloud.githubusercontent.com/assets/6264334/16403667/7e59a534-3cac-11e6-91a9-ace9ead20225.png

Phase 1: Data Source Identification

We are looking at 3 unique data sources, namely

Questions:

  1. Can we update this issue with specific download links for each data source?
  2. Can we better understanding of each data set, How was this data generated? What does it describe?
  3. Identify the data-format of each data set and specifically describe tools to process each data set?

Phase 2: Data Cleansing

Questions:

  1. What are the type of errors which can be seen with each of these data set? ( This requires understanding of how this data is generated )
  2. How do we identify potentially bad records / fields? ( ones which we can eliminate )
  3. Will we be looking to correct bad records in the data or just discard all of them?
  4. If we are looking to correct certain types of errors how do we go about doing this?

Phase 3: Data Integration We are looking at multi formatted data (csvs, shp files, hdf5 etc). We need to write scripts extract specific data attributes and dump it into a single place.

Questions:

  1. What is the technology we are looking to use for this? (RDBMS, NOSQL?)
  2. What are data-attributes in each data set relevant to this problem.
  3. How big is this aggregated data going to be? The size will determine the technology we need to use and the strategy we will be looking to adopt.

Phase 4: Insight generation Once we have the required data in one place, we need to write data aggregation scripts for specific insights.

Questions:

  1. What are the specific questions we are looking to answer with Aerosol delta? ( Very important - Lets try to list all of them )
  2. What technology do we use to write these aggregation scripts?
  3. Where do we store these intermediate aggregations? ( Elastic search? )

Phase 5: Data visualization Based on the prebuilt aggregations we need to build visualizations which answer the specific questions we’ve identified.

Questions:

  1. What are the type of visualizations we are looking to build? (Charts, Graphs?)
  2. How do we effectively show temporal change of Aerosol levels over a map? ( We should look to whiteboard these visualizations )

These are some of the top-level questions which came to my mind. I think we will have more questions.

Let’s look to open a new issue for each of these ‘Phases’ and discuss each one in detail based on this framework.

As far as I see it we should be done with the first 3 phases and at-least half of phase 4 before we reach Miami. If the datasets are huge these scripts take a lot of time to run. We shouldn’t be at a state where we are in the hackathon and half our time goes by in waiting for these scripts to finish.

What say guys?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/polar-computing/AerosolDelta/issues/3#issuecomment-228944375, or mute the threadhttps://github.com/notifications/unsubscribe/ARzdwSajlN5sF8fJ3IEbVDjb5cw9JOTmks5qQKDbgaJpZM4I6G3o.

karanjeets commented 8 years ago

Thanks @nithinkrishna and @kacasey

I think we can use this classification of "Phases" to categorize GitHub issue pages and discuss each phase in more detail.

Also, it would be great if we can construct a centralized index (database) from different sources (MODIS, MERRA-2, Landsat, etc). I am already working on it and will post updates in separate issue pages.

kacasey commented 8 years ago

Great! Thank you @nithinkrishna @karanjeets .