microbiomedata / nmdc_notebooks

Jupyter Notebooks demonstrating R and Python-based access to NMDC metadata and data
Creative Commons Zero v1.0 Universal
4 stars 0 forks source link

Add check to prevent key errors for lacking metadata in neon_soil_example #35

Closed kheal closed 1 month ago

kheal commented 2 months ago

Closes #28.

Issue arose from bio samples that must have been ingested after our initial creation of the notebooks. Some of the new bio samples do not have slots for geo_loc_name and are throwing a key error. I've added a check that subsets samples only for those with a collection_date, geo_loc_name and lat_lon before adding those data to downstream analyses.

kheal commented 2 months ago

Just as a reminder, I find it easiest to review by looking at the rendered notebook in the branch (https://nbviewer.org/github/microbiomedata/notebook_hackathons/blob/neon_metadata_keyerror/NEON_soil_metadata/python/neon_soil_metadata_visual_exploration.ipynb) and its associated google colab (https://colab.research.google.com/github/microbiomedata/notebook_hackathons/blob/neon_metadata_keyerror/NEON_soil_metadata/python/neon_soil_metadata_visual_exploration.ipynb).

Once the PR is merged into main, the notebooks will be accessible via the links in this readme: https://github.com/microbiomedata/nmdc_notebooks/tree/main/NEON_soil_metadata#readme.

brynnz22 commented 1 month ago

@kheal are the changes in cell 4?:

# Check if samp has keys that correspond to primary metadata
    if set(['lat_lon', 'geo_loc_name', 'collection_date']).issubset(samp):
kheal commented 1 month ago

Yep! And then a few spots down below that used to reference all_results now reference all_full_results.

kheal commented 1 month ago

To review: https://app.reviewnb.com/microbiomedata/nmdc_notebooks/pull/35/