polar-computing / AerosolDelta

Quantifying aerosol presence and composition over Earth's ice sheets and glaciers - mapping anthropogenic and natural aerosol patters and estimating changes over time
3 stars 1 forks source link

Phase 2: Data Retrieval and Cleaning Pipeline #4

Open karanjeets opened 8 years ago

karanjeets commented 8 years ago

Pipeline

  1. Download
    1. Randolph Glacier Inventory (RGI) 5.0 Complete
    2. MERRA-2 Aerosol Raster Modeled Data
    3. CALIOP Aerosol Raster observation Version 3 Aerosol Profile Data
    4. MODIS, Aqua, Collection 6, 8-day Surface Reflectance Raster Data
    5. Landsat Data
  2. From RGI data, unzip 00_rgi50_attribs which contains CSV files for each of the regions. Each file contains the Lat-Lon information.
  3. Extract all Latitudes and Longitudes and store it in a separate file named coordinates.txt
  4. For all latitude and longitude in coordinates.txt
    1. Extract all Scientific data (cited in README) from MERRA-2 M2TMNXAER.5.12.4 (Monthly Mean Data) for each month
    2. Extract 500m Surface Reflectance data from MODIS MYD09A1 8-Day L3 for each band and parse the bits
    3. Extract monthly data from CALIOP CAL_LID_L3_APro_CloudFree-Standard-V3-00
    4. Extract Surface Reflectance data from Landsat
  5. Index all the data in ElasticSearch (Decide how to model the schema)

    Questions

  6. Is there a better way to map Latitude and Longitude from RGI with MERRA-2 and other source files?
  7. How are we analyzing data from different sources? MERRA-2 is a monthly mean whereas MODIS is 8-day.
  8. Please describe what variables we should collect from each data source? We need to list them point by point to avoid any miss. As a start, it includes all 12 scientific data sets from MERRA-2 listed in README.
  9. Landsat needs more explanation on downloading and understanding the data. I have included a user guide in README. Please confirm if it is the correct one.
karanjeets commented 8 years ago

@kacasey : Can you please help with the questions?

@nithinkrishna : Please review and add more information and/or questions.

kacasey commented 8 years ago

Thanks for the proposed pipeline and questions.

1) No, extracting latitude longitude and keeping data classified by position is recommended.

2) Correct. Keeping the separate products in their native temporal acquisition is preferred. (i.e. MERRA-2 will be kept in monthly format, CALIOP in monthly format, MODIS in 8-day format, Landsat in daily format. We will compare to similar time frames (e.g. MERRA-2, CALIOP month Jan corresponds to MODIS first 4 8-day periods).

3) All variables are/will be shortly updated in the README file.

4) Landsat data will be reviewed.

karanjeets commented 8 years ago

@kacasey : Thanks for the answers. Currently, I am developing a script to extract all the variables (mentioned in the README file) and index it into the ElasticSearch index. The index schema will be shared soon.

I want to run this complete pipeline for one region first. This will help us to fail fast and provide an opportunity to work through most of the issues before we reach there.

So, from the list of 19 regions, please pick one that you are most interested in and could provide good Aerosol data.

@nithinkrishna : Let's sit tomorrow evening to discuss the schema.

@r4space : Do you know where can we host the ElasticSearch index? What systems are we getting from XSEDE? Sorry, I have a couple of them from my current project and I am not able to differentiate the new ones. Is it Comet and Gordon Compute Cluster?

kacasey commented 8 years ago

@karanjeets Great! Thanks for all the progress, including script development.

For the testing region, I propose the Himalayas. Although one mountain range, it is broken down into 3 RGI regions (regions 13, 14, 15) -- denoted by yellow box in below image. There is a great mix of aerosol species which impact the region (dust, black carbon, sulfates).
rgi-regions

karanjeets commented 8 years ago

@kacasey Thanks for the regions. We are working on them.

Yesterday, I and Nithin discussed the pipeline in depth and came up with some questions.

(a) Will the Latitudes and Longitudes [Lat, Lon] in RGI be different from the one mentioned in MODIS, MERRA-2 and other sources? Our assumption is yes and if we are correct, we need to map them using the corresponding area field in RGI.

(b) Do we have multiple region-specific Lat Lon in RGI based on time? The reason why we are asking this is there may be a Lat Lon which was part of the region in 2000 but not now due to the melting of ice or other climatic changes.

(c) Will the (Lat, Lon) be same across time in one source? For eg: MERRA-2 files from January 2000 will have same (Lat, Lon) as files from February 2000 or March 2001?

kacasey commented 8 years ago

Hi @karanjeets & @nithinkrishna, great that you were able to discuss yesterday.

(a) Latitude/Longitude values are reported from RGI and the different products in slightly different ways. All Latitude/Longitude values should be used for mapping on one central grid. I can help define where/how latitude/longitude is saved for each file type if that is helpful. I just added this info to CALIPSO data and Landsat data in the Readme file.

(b) and (c) Yes, you are correct in thinking that climate change alters landscapes, melts ice, etc. However, the latitude/longitude points remain constant over time. Yes MERRA-2 files from January 2000 will have the same latitude/longitude range, points, resolution as files from March 2010.

I hope this is helpful. Please let me know if you have other questions.