Closed btudor01 closed 2 years ago
Hi @btudor01 , You can modify Step 8 of the tutorial as follows. I have indicated the changed lines with the comment "# L4_QUALITY_FLAG".
# setting up maximum retries to get around Hyrax 500 error
retries = Retry(total=3, backoff_factor=0.1, status_forcelist=[ 500, 502, 503, 504 ])
session.mount('https://', HTTPAdapter(max_retries=retries))
# plotting the area of interest
ax=aca.to_crs(epsg=3857).plot(figsize=(5, 6), alpha=0.3)
ctx.add_basemap(ax)
ax.set_title("Starting download...")
display.display(plt.gcf())
display.clear_output(wait=True)
c=0
for g_name in opendap_arr:
c += 1
# loop over all beams
for beam in beams:
# 1. Retrieving lat, lon coordinates for the file
# L4_QUALITY_FLAG
hyrax_url = f"{g_name}.dap.nc4?dap4.ce=/{beam}/lon_lowestmode;/{beam}/lat_lowestmode;/{beam}/l4_quality_flag"
r = session.get(hyrax_url)
if (r.status_code != 400):
ds = nc.Dataset('hyrax', memory=r.content)
lat = ds[beam]['lat_lowestmode'][:]
lon = ds[beam]['lon_lowestmode'][:]
# L4_QUALITY_FLAG
l4_quality_flag = ds[beam]['l4_quality_flag'][:]
ds.close()
# L4_QUALITY_FLAG
df = pd.DataFrame({'lat_lowestmode': lat, 'lon_lowestmode': lon, 'l4_quality_flag': l4_quality_flag}) # creating pandas dataframe
# 2. Subsetting by bounds of the area of interest
# converting to geopandas dataframe
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.lon_lowestmode, df.lat_lowestmode))
gdf_aca = gdf[gdf['geometry'].within(aca.geometry[0])]
# L4_QUALITY_FLAG
gdf_aca = gdf_aca[gdf_aca['l4_quality_flag']==1]
if not gdf_aca.empty:
# creating empty columns for variables
for v in headers[2:]:
gdf_aca[v] = None
# 3. retrieving variables of interest, agbd, agbd_t in this case.
# We are only retriving the shots within subset area.
for _, df_gr in gdf_aca.groupby((gdf_aca.index.to_series().diff() > 1).cumsum()):
i = df_gr.index.min()
j = df_gr.index.max()
for v in headers[2:]:
var_s = f"/{beam}/{v}%5B{i}:{j}%5D"
hyrax_url = f"{g_name}.dap.nc4?dap4.ce={var_s}"
r = session.get(hyrax_url)
if (r.status_code != 400):
ds = nc.Dataset('hyrax', memory=r.content)
gdf_aca.loc[i:j, (v)] = ds[beam][v][:]
ds.close()
# saving the output file
gdf_aca.to_csv(out_csv, mode='a', index=False, header=False, columns=headers)
# plotting the shots
gdf_aca.crs = "EPSG:4326"
gdf_aca.to_crs(epsg=3857).plot(alpha=0.01, ax=ax, linewidth=0)
ax.set_title(f"Downloading {c} of {total_granules}... {g_name.rsplit('/', 1)[-1]} / {beam}")
display.display(plt.gcf())
display.clear_output(wait=True)
Really appreciate the quick and helpful response! Unfortunately I'm getting this error:
@btudor01 I am sorry that part should have been in the following line. I have now corrected the code above.
df = pd.DataFrame({'lat_lowestmode': lat, 'lon_lowestmode': lon, 'l4_quality_flag': l4_quality_flag}) # creating pandas dataframe
Sorry, now I'm getting this error:
@btudor01 My best guess is the dataset was not closed (ds.close()) in the previous runs. You might want to try restarting your Jupyter kernel. Let us know if you still get the same error.
Hello— I'm doing some Amazon wide research with GEDI L4A data, and because the data volume is enormous I've been using the "Accessing GEDI L4A Dataset with NASA OPeNDAP in the Cloud" tutorial to subset to only the necessary variables. I'm wondering if there is a way to only retrieve the data that has a value of 1 for the quality flag when downloading the science variables into a CSV. I know that might be beyond the scope of this tutorial, but if there are any quick methods of doing this, I would really appreciate the help. Thanks!