Closed Azaya89 closed 1 week ago
This is still a WIP. Not ready for review yet.
intake
These are the current issues preventing the complete modernization of this notebook:
intake
to <2
, only version 0.6.2
runs without errors. For example, executing metadata = cat.fluxnet_metadata().read()
results in the following traceback error with other versions:Pinning intake=0.6.2
resolves this issue without any traceback errors.
fluxnet
files shows inconsistent behavior:s3 = S3FileSystem(anon=True)
s3_paths = s3.glob('earth-data/carbon_flux/nee_data_fusion/FLX*')
datasets = []
skipped = []
used = []
for i, s3_path in enumerate(s3_paths):
sys.stdout.write(f'\r{i+1}/{len(s3_paths)}')
try:
dd = cat.fluxnet_daily(s3_path=s3_path).to_dask()
except FileNotFoundError:
try:
dd = cat.fluxnet_daily(s3_path=s3_path.split('/')[-1]).to_dask()
except FileNotFoundError:
continue
site = dd['site'].cat.categories.item()
if not set(dd.columns) >= set(data_columns):
skipped.append(site)
continue
datasets.append(clean_data(dd))
used.append(site)
print()
print(f'Found {len(used)} fluxnet sites with enough data to use - skipped {len(skipped)}')
This cell sometimes generates the following traceback:
This warning is repeated for all the cells up to 209/209.
The circumstances under which this error occurs are unclear. A temporary solution, discovered with the help of @hoxbro, involves removing the local version of intake
and re-downloading it using anaconda-project run
. This typically resolves the issue. However, restarting the kernel and running the notebook from the top down might bring back the Traceback error.
partial_soil_data = df[df[soil_data_columns].notnull().any(1)]
partial_soil_data_sites = metadata[metadata.site.isin(partial_soil_data.site.unique())]
Traceback:
TypeError Traceback (most recent call last)
Cell In[20], line 1
----> 1 partial_soil_data = df[df[soil_data_columns].notnull().any(1)]
2 partial_soil_data_sites = metadata[metadata.site.isin(partial_soil_data.site.unique())]
TypeError: DataFrame.any() takes 1 positional argument but 2 were given
Using any(axis=1)
resolves this error. However, if problem 2 does not occur, this cell runs without the TypeError
.
@maximlt @droumis
I have completely re-wrote the notebook to remove all usage of intake
.
The .csv
files are downloaded locally via awscli
by running anaconda-project run download_fluxnet_daily
. This takes about a minute to download all the files and saves in the same folder as the .txt
file.
Some of the cells are failing the test now and I don't know why. I will investigate that later.
Otherwise, I think this is ready for review now.
@hoxbro
I have pushed a fix that will make the test pass. I'm unsure why it doesn't work when you scatter the index.
The doc build is failing; @Azaya89, can you try and see if you can fix this?
Arf @Azaya89 I see we're still having some issues. The error we encounter looks very similar to the one reported here https://github.com/aws/aws-cli/issues/8988. Digging more into this direction should hopefully give us a solution. This for instance looks promising https://github.com/aws/aws-cli/issues/5623#issuecomment-801240811, this too https://stackoverflow.com/questions/64992288/s3-sync-issue-running-in-azure-devops-pipeline-on-linux.
Arf @Azaya89 I see we're still having some issues. The error we encounter looks very similar to the one reported here aws/aws-cli#8988. Digging more into this direction should hopefully give us a solution. This for instance looks promising aws/aws-cli#5623 (comment), this too https://stackoverflow.com/questions/64992288/s3-sync-issue-running-in-azure-devops-pipeline-on-linux.
Thank you. Let me try this out...
Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.
The doc build is failing; @Azaya89, can you try and see if you can fix this?
Fixed. I think it is ready for final review now @hoxbro
Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.
Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.
Another run has replaced the dev docs site. I want to make sure you checked if everything looked good before it was replaced.
LGTM!
Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.
Modernizing an example checklist
Preliminary checks
Change ‘anaconda-project.yml’ to use the latest workable version of packages
hvplot<0.9
tohvplot
,panel>=0.12,<1.0
topanel>=0.12
) of all other dependencies. Removing the upper pins of dependencies could necessitate code revisions in the notebooks to address any errors encountered in the updated environment. Should complexities or extensive time requirements arise, document issues for team discussion on whether to re-pin specific packages or explore other solutions.hvplot
tohvplot>=0.9.2
,hvplot>=0.8
tohvplot>=0.9.2
). Usually, the new/updated lower pin of a dependency will be the version resolved afteranaconda prepare
has been run. Execute!conda list
in a notebook, oranaconda run conda list
in the terminal, to display the version of each dependency installed in the environment. Adjusting the lower pin helps ensure that the locks produced for each platform (linux-64, win-64, osx-64, osx-arm64) rely on the tested dependencies and not on some older versions.Plot API updates (discussed on a per-example basis)
datashade
withrasterize
(read this page). Essentially,rasterize
allows Bokeh to handle the colormapping instead of Datashader.Interactivity API updates (discussed on a per-example basis)
pn.interact
usage.param.watch()
usage. This is pretty low-level and verbose approach and should not be used in Examples unless required, or an Example is specifically trying to demo its usage in an advanced workflow.pn.bind()
. Read this page for explanation.view()
method and call it directly, update the class by inheriting frompn.viewable.Viewer
and replaceview()
by__panel__()
. Here is an example.Panel App updates (discussed on a per-example basis)
pn.Column
, or more complicated to incorporate widgets, etc. Make the final app.servable()
.command: dashboard
declaration in theanaconda-project.yml
file), try adding it.template = pn.template.BootstrampTemplate
, but if building up an app across multiple cells, it is probably cleaner to declare the template at the top withpn.extension(template='bootstrap')
. See how to guide on setting a template.General code quality updates
warnings.simplefilter(‘ignore’)
somewhere at the start of the notebook, remove this line. Try to update the code to remove the warnings, if any. If updating the code to remove the warnings is taking significant amount of time and effort, bring it up for discussion and we may decide to disable warnings again.Text content
Visual appearance - Example
Visual appearance - Gallery
Ml Annotators
toML Annotators
), if not, add/update theexamples_config.title
field inanaconda-project.yml
description
field inanaconda-project.yml
Workflow (after you have made the changes above)
doit validate:<projectname>
doit test:<projectname>
doit doc_one –name <projectname>
. It’s better if the project notebook(s) is saved with its outputs (but be sure to clear outputs before committing to the examples repo!) when building the docs. Then open this file in your browser./builtdocs/index.html
and check how the site looks.