Subsetting only quality shots in the OpenDAP tutorial

btudor01 commented 2 years ago

Hello— I'm doing some Amazon wide research with GEDI L4A data, and because the data volume is enormous I've been using the "Accessing GEDI L4A Dataset with NASA OPeNDAP in the Cloud" tutorial to subset to only the necessary variables. I'm wondering if there is a way to only retrieve the data that has a value of 1 for the quality flag when downloading the science variables into a CSV. I know that might be beyond the scope of this tutorial, but if there are any quick methods of doing this, I would really appreciate the help. Thanks!

rupesh2 commented 2 years ago

Hi @btudor01 , You can modify Step 8 of the tutorial as follows. I have indicated the changed lines with the comment "# L4_QUALITY_FLAG".

# setting up maximum retries to get around Hyrax 500 error
retries = Retry(total=3, backoff_factor=0.1, status_forcelist=[ 500, 502, 503, 504 ])
session.mount('https://', HTTPAdapter(max_retries=retries))

# plotting the area of interest
ax=aca.to_crs(epsg=3857).plot(figsize=(5, 6), alpha=0.3)
ctx.add_basemap(ax)
ax.set_title("Starting download...")
display.display(plt.gcf())
display.clear_output(wait=True)

c=0
for g_name in opendap_arr:
    c += 1
    # loop over all beams
    for beam in beams:       
        # 1. Retrieving lat, lon coordinates for the file
        # L4_QUALITY_FLAG
        hyrax_url = f"{g_name}.dap.nc4?dap4.ce=/{beam}/lon_lowestmode;/{beam}/lat_lowestmode;/{beam}/l4_quality_flag"
        r = session.get(hyrax_url)
        if (r.status_code != 400):
            ds = nc.Dataset('hyrax', memory=r.content)
            lat = ds[beam]['lat_lowestmode'][:]
            lon = ds[beam]['lon_lowestmode'][:]
            # L4_QUALITY_FLAG
            l4_quality_flag = ds[beam]['l4_quality_flag'][:]
            ds.close()
            # L4_QUALITY_FLAG
            df = pd.DataFrame({'lat_lowestmode': lat, 'lon_lowestmode': lon, 'l4_quality_flag': l4_quality_flag}) # creating pandas dataframe  

            # 2. Subsetting by bounds of the area of interest
            # converting to geopandas dataframe
            gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.lon_lowestmode, df.lat_lowestmode)) 
            gdf_aca = gdf[gdf['geometry'].within(aca.geometry[0])]   
            # L4_QUALITY_FLAG
            gdf_aca = gdf_aca[gdf_aca['l4_quality_flag']==1] 
            if not gdf_aca.empty:
                # creating empty columns for variables
                for v in headers[2:]:
                    gdf_aca[v] = None
                # 3. retrieving variables of interest, agbd, agbd_t in this case.
                # We are only retriving the shots within subset area.
                for _, df_gr in gdf_aca.groupby((gdf_aca.index.to_series().diff() > 1).cumsum()):
                    i = df_gr.index.min()
                    j = df_gr.index.max()
                    for v in headers[2:]:
                        var_s = f"/{beam}/{v}%5B{i}:{j}%5D"
                        hyrax_url = f"{g_name}.dap.nc4?dap4.ce={var_s}"
                        r = session.get(hyrax_url)
                        if (r.status_code != 400):
                            ds = nc.Dataset('hyrax', memory=r.content)
                            gdf_aca.loc[i:j, (v)] = ds[beam][v][:]
                            ds.close()

                # saving the output file
                gdf_aca.to_csv(out_csv, mode='a', index=False, header=False, columns=headers)

                # plotting the shots
                gdf_aca.crs = "EPSG:4326"
                gdf_aca.to_crs(epsg=3857).plot(alpha=0.01, ax=ax, linewidth=0)
                ax.set_title(f"Downloading {c} of {total_granules}... {g_name.rsplit('/', 1)[-1]} / {beam}")
                display.display(plt.gcf())
                display.clear_output(wait=True)

btudor01 commented 2 years ago

Really appreciate the quick and helpful response! Unfortunately I'm getting this error:

rupesh2 commented 2 years ago

@btudor01 I am sorry that part should have been in the following line. I have now corrected the code above.

df = pd.DataFrame({'lat_lowestmode': lat, 'lon_lowestmode': lon, 'l4_quality_flag': l4_quality_flag}) # creating pandas dataframe

btudor01 commented 2 years ago

Sorry, now I'm getting this error:

rupesh2 commented 2 years ago

@btudor01 My best guess is the dataset was not closed (ds.close()) in the previous runs. You might want to try restarting your Jupyter kernel. Let us know if you still get the same error.

ornldaac / gedi_tutorials

Subsetting only quality shots in the OpenDAP tutorial #9