Open kheal opened 5 months ago
@brynnz22 - can you also add an example of the endpoint you used to download the tsvs of the taxonomic information? I couldn't figure out an easy way to access the url for this step. The url I'm talking about is in code chunk 30 in this notebook.
@kheal that url was just taken from the metadata retrieved using the _metadata collection _endpoint__ that you already mentioned.
Great - so the three endpoints I point to above will cover the API calls we've used in the notebook, correct? @brynnz22
Yep! That should be right.
Related to #301
@kheal Is this the example notebook you're referring to in your first comment? https://github.com/microbiomedata/notebook_hackathons/tree/main/taxonomic_dist_by_soil_layer
Could you share the relevant code chunks in this notebook?
@PeopleMakeCulture - that is one of the notebooks.
These two also use the runtime API: https://github.com/microbiomedata/notebook_hackathons/tree/main/NEON_soil_metadata and https://github.com/microbiomedata/notebook_hackathons/tree/main/bioscales_biogeochemical_metadata (in both the R and python versions, for a total of 5 notebooks).
Do you want/need me to point to each chunk in each notebook (5 notebooks total) that pings the API?
@kheal Gotcha. The notebook links should be enough. Thanks!
@kheal @brynnz22 are the notebooks all "quick"? We could potentially just run them all to make sure they don't error, with e.g. papermill:
import papermill as pm
for nb_filename in nb_filenames:
try:
pm.execute_notebook(
nb_filename,
'output_' + nb_filename,
parameters=dict(parameter_name='value')
)
except pm.exceptions.PapermillExecutionError as e:
print("An error occurred during execution:", e)
# Custom error handling or cleanup code here
# raise for pytest
@dwinston
Unfortunately no. This notebook takes a couple of hours (in part bc there is not an easy API route to go from biosample ids to data objects, see #355).
The get requests I have at the top of this thread are type examples and should be sufficient as tests to make sure the endpoints are still good.
We should discuss potential notebook testing options as well, and what makes the most sense. I have some folks in my group who have experience with other Jupyter testing tools like nbmake (https://github.com/treebeardtech/nbmake)
Also fine with papermill but the typical papermill use case I've seen is centered around running a notebook job in parallel where it gets parameterized across different inputs. Just want to make sure we are using the right tool for the job.
I've also seen this (from the same people at Netflix that made papermill) - https://github.com/nteract/testbook
I should have noted that the other four notebooks in these locations https://github.com/microbiomedata/notebook_hackathons/tree/main/NEON_soil_metadata and https://github.com/microbiomedata/notebook_hackathons/tree/main/bioscales_biogeochemical_metadata are pretty quick and it'd be great to have those tested in the CI/CD.
We want to make sure the example notebooks in this repo: https://github.com/microbiomedata/notebook_hackathons do not break with any changes or pushes to the NMDC-runtime API.
The following endpoints are used (with example tests).