snowex-hackweek / website-2022

Event webpage and Jupyterbook 2022
https://snowex-2022.hackweek.io
MIT License
2 stars 32 forks source link

UAVSAR Tutorial Improvements #82

Closed jacktarricone closed 2 years ago

jacktarricone commented 2 years ago

added my banner summit example and did some grammatical editing. Zach still needs to do a few more edits before this should be merged into main.

Also @scottyhq, we're grabbing data through a git clone right now, do you think this will work when 70 people try to hit it at once? If not we can put it in the temp s3 bucket within a week of when we present.

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

github-actions[bot] commented 2 years ago

Binder :point_left: Launch a binder notebook on this branch

scottyhq commented 2 years ago

we're grabbing data through a git clone right now, do you think this will work when 70 people try to hit it at once? If not we can put it in the temp s3 bucket within a week of when we present.

I'm curious to find out how that goes! putting a backup on S3 sounds good. I know for sure 70 simultaneous reads from S3 work well if you format your data as COGs.

ZachHoppinen commented 2 years ago

@jacktarricone I think you should get your pull request successfully integrated. I will add the very few small changes I had in one final PR after this gets in. I am worried it will get very messy if I start editing before you get your pieces uploaded

jacktarricone commented 2 years ago

changed my /tmp paths. having issues with the SQL code in nb #3, but that could be my machine. also took the git merge stuff out of _toc.yml

jacktarricone commented 2 years ago

changed the kernel names on my local notebooks, so we'll see if it passes

jacktarricone commented 2 years ago
Screen Shot 2022-07-07 at 10 35 37 AM

@scottyhq not sure what's going wrong here

scottyhq commented 2 years ago

The logs can be a bit confusing with the cache output. I highly recommend running these notebooks on the jupyterhub (so login, then gh pr checkout 82). Note that each notebook should run top to bottom without intervention. Imagine somebody trying to run 3 before 2 ,so you need to make sure the data is there in the top cell of each notebook. See how the lidar tutorial does it by creating a release of the repository to have a zip file:

https://snowex.hackweek.io/tutorials/lidar/2_elevation_differencing.html#download-required-tutorial-data https://snowex.hackweek.io/tutorials/lidar/3_common_pitfalls.html

scottyhq commented 2 years ago

I just tried running 2_elevation_differencing.ipynb but get NameError: name 'data_path' is not defined

ZachHoppinen commented 2 years ago

@micah-prime and @micahjohnson150 - it seems like the database is down right now? All the code we have using the database is no longer working and when I run code from your tutorials it also fails. Screen Shot 2022-07-07 at 1 37 06 PM

micahjohnson150 commented 2 years ago

@ZachKeskinen I don't think it is down. It probably due to the snowexsql update which we launched yesterday. To fix this you will have to merge in main as my update just went into main this morning.

scottyhq commented 2 years ago

PR #86 updated the environment to use snowexsql === 0.3.0. That is what is gets installed if you were to recreate the environment locally now. Otherwise, on the jupyterhub it's there automatically (from a terminal you can confirm with conda list | grep sql or echo $JUPYTER_IMAGEshould show quay.io/uwhackweek/snowex:2022.07.07

(unless you were logged in earlier, then you'd have to `File -> Hub Control Panel -> Stop my server' and then Start Server to get the latest image)

ZachHoppinen commented 2 years ago

Gotcha that makes sense. I will re-install the environment and make sure that fixes it. Thanks

github-actions[bot] commented 2 years ago

🚀 Deployed on https://deploy-preview-82--snowex2022.netlify.app

ZachHoppinen commented 2 years ago

@micahjohnson150 Thanks! I updated the version and that seems to fix most things. A few things seem to be different in the cells that use the database from before the update:

  1. I now have CRS mismatches between data sets of snow depth coming from grand mesa where the only difference is the dates I filtered between. However, it seems to be sporadic with the CRS error mismatch for a few runs and then no error now?

  2. that same cell that has the crs mismatch has gone from running in <30 seconds to taking over 2 minutes. Did you add more data to the database so that it is pulling more in now? When I run

    qry = session.query(PointData)
    qry = qry.filter(PointData.type == 'depth')
    qry_feb1 = qry.filter(PointData.date >= date(2020, 1, 31))
    qry_feb1 = qry_feb1.filter(PointData.date <= date(2020, 2, 2))
    df_feb_1 = query_to_geopandas(qry_feb1, engine)

    it takes around 2 minutes but before was running quite quickly?

  3. There now seem to be Nones in sets of data that previously didn't have any? No big deal because I can drop them but might be worth checking on other tutorials that use the database.

micahjohnson150 commented 2 years ago

Yeah there is a lot of gpr data that got added yesterday. And the big thing that happened last week is the database now has multiple projections in it. So the best it to pick by site_name to focus on a site like grand mesa. I would add the following to your query to clean it up.

qry = qry.filter(PointData.site_name == 'Grand Mesa')
qry = qry.filter(PointData.instrument == 'magnaprobe')

When forming queries definitely try to employ some safe guard tactics to avoid waiting for 5 minutes to find out it is messed up. One strategy is to use .limit(1000) to test your queries. Another tact I use, especially on the GPR data (~1M points!), is qry.filter(PointData.id % 1000 == 0) which just skips every 1000 points so I still see if it is the expected pattern but not have to wait forever.

micahjohnson150 commented 2 years ago

3. There now seem to be Nones in sets of data that previously didn't have any? No big deal because I can drop them but might be worth checking on other tutorials that use the database.

Could you send me a query that you are seeing this in?

ZachHoppinen commented 2 years ago

So the only piece that has this issue is checking for permittivities. No Nones before but now we get a few if we don't check for them.

qry = qry.filter(LayerData.type == 'permittivity')
df = query_to_geopandas(qry, engine)
es_values = []
# Loop through each snowpit (each unique site-id is a snowpit) 
for id in np.unique(df.site_id):
    sub = df[df.site_id == id]
# get the permittivity of the highest layer in the snowpack
    es_str = sub.sort_values(by = 'depth', ascending = False).iloc[0]['value']
# added this check after the update since we now get Nones
    if es_str != None:
        es = float(es_str)
        if es != None:
            es_values.append(es)

Did you add more snowpits in addition to the GPR?

ZachHoppinen commented 2 years ago

@micahjohnson150 That was exactly what I needed. Look like I was pulling in a bunch of GPR data that I didn't want. Back down to 9 seconds. Thanks so much Micah!

ZachHoppinen commented 2 years ago

@scottyhq Hey Scott, any thoughts on this error?

Screen Shot 2022-07-08 at 1 16 03 PM