Azaya89 commented 3 months ago

Modernizing an example checklist

Preliminary checks

[x] Look for open PRs and issues that reference the project you are updating. It is possible previous unmerged work in PR could be re-used to modernize the project. Comment on these PRs and issues when appropriate, hopefully we should be able to close some of them after your modernizing work.

Change ‘anaconda-project.yml’ to use the latest workable version of packages

[x] Pin python=3.11
[x] Remove the upper pin (e.g. hvplot<0.9 to hvplot, panel>=0.12,<1.0 to panel>=0.12) of all other dependencies. Removing the upper pins of dependencies could necessitate code revisions in the notebooks to address any errors encountered in the updated environment. Should complexities or extensive time requirements arise, document issues for team discussion on whether to re-pin specific packages or explore other solutions.
[x] Add/update the lower pin of all other dependencies (e.g. hvplot to hvplot>=0.9.2, hvplot>=0.8 to hvplot>=0.9.2). Usually, the new/updated lower pin of a dependency will be the version resolved after anaconda prepare has been run. Execute !conda list in a notebook, or anaconda run conda list in the terminal, to display the version of each dependency installed in the environment. Adjusting the lower pin helps ensure that the locks produced for each platform (linux-64, win-64, osx-64, osx-arm64) rely on the tested dependencies and not on some older versions.
[x] If one of the channels include conda-forge or pyviz, ask Maxime if it can be removed

Plot API updates (discussed on a per-example basis)

[x] Generally, try to replace HoloViews usage with hvPlot. At a certain point of complexity, such as with the use of ‘.select’, it might be better to stick with HoloViews. Additional examples of ‘complexity boundaries’ should be documented in this document.
[x] Almost always, try to replace the use of datashade with rasterize (read this page). Essentially, rasterize allows Bokeh to handle the colormapping instead of Datashader.

Interactivity API updates (discussed on a per-example basis)

[x] Remove all pn.interact usage
[x] Avoid .param.watch() usage. This is pretty low-level and verbose approach and should not be used in Examples unless required, or an Example is specifically trying to demo its usage in an advanced workflow.
[x] Prefer using pn.bind(). Read this page for explanation.
[x] For apps built using a class approach, when they create a view() method and call it directly, update the class by inheriting from pn.viewable.Viewer and replace view() by __panel__(). Here is an example.

Panel App updates (discussed on a per-example basis)

[x] If the project doesn’t at any point create a Panel app at all, consider creating one. It can be as simple as wrapping a plot in pn.Column, or more complicated to incorporate widgets, etc. Make the final app .servable().
[x] If the project creates an app in a notebook but doesn’t deploy it (i.e. there is no command: dashboard declaration in the anaconda-project.yml file), try adding it.
[x] If the project already deploys an app but doesn’t wrap it in a nice template, consider wrapping it in a template.
[x] If the project deploys an app wrapped in a template, customize the template a little so all the apps don’t look similar (e.g. change the header background color). This doesn’t need to be discussed.
[x] Comment start If you are building the application in a single cell, you can construct a template explicitly, like template = pn.template.BootstrampTemplate, but if building up an app across multiple cells, it is probably cleaner to declare the template at the top with pn.extension(template='bootstrap'). See how to guide on setting a template.

General code quality updates

[x] If the notebook disables warnings (e.g. with warnings.simplefilter(‘ignore’) somewhere at the start of the notebook, remove this line. Try to update the code to remove the warnings, if any. If updating the code to remove the warnings is taking significant amount of time and effort, bring it up for discussion and we may decide to disable warnings again.

Text content

[x] Edit the text content anywhere and everywhere that it can be improved for clarity.
[x] Check the links are valid, and update old links (e.g. http -> https, xyz.pyviz.org -> xyz.holoviz.org)
[x] Remove instructions to install packages inside an example

Visual appearance - Example

[x] Check that the titles/headings make sense and are succinct.
[x] Check that the text content blocks are easily readable; revise into additional paragraphs if needed.
[x] Check that the code blocks are easily readable; revise as needed. (e.g. add spaces after commas in a list if there are none, wrap long lines, etc.)
[x] Check image and plot sizes. If possible, making them responsive is highly recommended.
[x] Check the appearance on a smartphone (check Google to see how to adapt the appearance of your browser to display pages as if they were seen from a smartphone, this is usually done via the web developer tools). This is not a top priority for all examples, but if there are a few easy and straightforward changes to make that can improve the experience, let’s do it.
[x] Check the updated notebook with the original notebook

Visual appearance - Gallery

[x] Check the thumbnail is visually appealing
[x] Check the project title is well formatted (e.g. Ml Annotators to ML Annotators), if not, add/update the examples_config.title field in anaconda-project.yml
[x] Check the project description is appropriate, if not, update the description field in anaconda-project.yml

Workflow (after you have made the changes above)

[x] Run successfully doit validate:<projectname>
[x] Run successfully doit test:<projectname>
[x] Run successfully doit doc_one –name <projectname>. It’s better if the project notebook(s) is saved with its outputs (but be sure to clear outputs before committing to the examples repo!) when building the docs. Then open this file in your browser ./builtdocs/index.html and check how the site looks.
[x] If you’re happy with all the above, open a PR. Reminder, clear notebook outputs before pushing to the PR.

Azaya89 commented 3 months ago

This is still a WIP. Not ready for review yet.

Azaya89 commented 3 months ago

Bug Report on this example notebook: Inconsistency with the usage of `intake`

These are the current issues preventing the complete modernization of this notebook:

Version Compatibility: Although it is recommended to pin intake to <2, only version 0.6.2 runs without errors. For example, executing metadata = cat.fluxnet_metadata().read() results in the following traceback error with other versions:

Traceback

``` ValueError Traceback (most recent call last) Cell In[4], line 1 ----> 1 metadata = cat.fluxnet_metadata().read() 2 metadata.sample(5) File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/intake/source/csv.py:190, in CSVSource.read(self) 186 return self._dask_df.compute() 188 import pandas as pd --> 190 self._get_schema() 191 return pd.concat([self._get_partition(i) for i in range(len(self.files()))]) File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/intake/source/csv.py:142, in CSVSource._get_schema(self) 140 nrows = self._csv_kwargs.get("nrows") 141 self._csv_kwargs["nrows"] = 10 --> 142 df = self._get_partition(0) 143 if nrows is None: 144 del self._csv_kwargs["nrows"] File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/intake/source/csv.py:160, in CSVSource._get_partition(self, i) 157 return self._dask_df.get_partition(i).compute() 159 url_part = self.files()[i] --> 160 return self._read_pandas(url_part, i) File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/intake/source/csv.py:166, in CSVSource._read_pandas(self, url_part, i) 163 import pandas as pd 165 if self.pattern is None: --> 166 return pd.read_csv(url_part, storage_options=self._storage_options, **self._csv_kwargs) 168 drop_path_column = "include_path_column" not in self._csv_kwargs 169 path_column = self._path_column() File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend) 1013 kwds_defaults = _refine_defaults_read( 1014 dialect, 1015 delimiter, (...) 1022 dtype_backend=dtype_backend, 1023 ) 1024 kwds.update(kwds_defaults) -> 1026 return _read(filepath_or_buffer, kwds) File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/parsers/readers.py:620, in _read(filepath_or_buffer, kwds) 617 _validate_names(kwds.get("names", None)) 619 # Create the parser. --> 620 parser = TextFileReader(filepath_or_buffer, **kwds) 622 if chunksize or iterator: 623 return parser File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1620, in TextFileReader.__init__(self, f, engine, **kwds) 1617 self.options["has_index_names"] = kwds["has_index_names"] 1619 self.handles: IOHandles | None = None -> 1620 self._engine = self._make_engine(f, self.engine) File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1880, in TextFileReader._make_engine(self, f, engine) 1878 if "b" not in mode: 1879 mode += "b" -> 1880 self.handles = get_handle( 1881 f, 1882 mode, 1883 encoding=self.options.get("encoding", None), 1884 compression=self.options.get("compression", None), 1885 memory_map=self.options.get("memory_map", False), 1886 is_text=is_text, 1887 errors=self.options.get("encoding_errors", "strict"), 1888 storage_options=self.options.get("storage_options", None), 1889 ) 1890 assert self.handles is not None 1891 f = self.handles.handle File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/common.py:728, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options) 725 codecs.lookup_error(errors) 727 # open URLs --> 728 ioargs = _get_filepath_or_buffer( 729 path_or_buf, 730 encoding=encoding, 731 compression=compression, 732 mode=mode, 733 storage_options=storage_options, 734 ) 736 handle = ioargs.filepath_or_buffer 737 handles: list[BaseBuffer] File ~/Documents/development/holoviz-topics-examples/carbon_flux/envs/default/lib/python3.11/site-packages/pandas/io/common.py:453, in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options) 445 return IOArgs( 446 filepath_or_buffer=file_obj, 447 encoding=encoding, (...) 450 mode=fsspec_mode, 451 ) 452 elif storage_options: --> 453 raise ValueError( 454 "storage_options passed with file object or non-fsspec file path" 455 ) 457 if isinstance(filepath_or_buffer, (str, bytes, mmap.mmap)): 458 return IOArgs( 459 filepath_or_buffer=_expand_user(filepath_or_buffer), 460 encoding=encoding, (...) 463 mode=mode, 464 ) ValueError: storage_options passed with file object or non-fsspec file path ```

Pinning intake=0.6.2 resolves this issue without any traceback errors.

Inconsistency in File Downloads: The cell responsible for downloading the full fluxnet files shows inconsistent behavior:

s3 = S3FileSystem(anon=True)
s3_paths = s3.glob('earth-data/carbon_flux/nee_data_fusion/FLX*')

datasets = []
skipped = []
used = []

for i, s3_path in enumerate(s3_paths):
    sys.stdout.write(f'\r{i+1}/{len(s3_paths)}')

    try:
        dd = cat.fluxnet_daily(s3_path=s3_path).to_dask()
    except FileNotFoundError:
        try:
            dd = cat.fluxnet_daily(s3_path=s3_path.split('/')[-1]).to_dask()
        except FileNotFoundError:
            continue
    site = dd['site'].cat.categories.item()

    if not set(dd.columns) >= set(data_columns):
        skipped.append(site)
        continue

    datasets.append(clean_data(dd))
    used.append(site)

print()
print(f'Found {len(used)} fluxnet sites with enough data to use - skipped {len(skipped)}')

This cell sometimes generates the following traceback:

Traceback

``` 1/209 /Users/mac/Documents/development/examples/carbon_flux/envs/default/lib/python3.11/site-packages/dask_expr/_collection.py:4160: UserWarning: You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly. To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using. Before: .apply(func) After: .apply(func, meta=(None, 'object')) warnings.warn(meta_warning(meta)) /Users/mac/Documents/development/examples/carbon_flux/envs/default/lib/python3.11/site-packages/dask_expr/_collection.py:4160: UserWarning: You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly. To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using. Before: .apply(func) After: .apply(func, meta=('TIMESTAMP', 'object')) warnings.warn(meta_warning(meta)) ```

This warning is repeated for all the cells up to 209/209.

The circumstances under which this error occurs are unclear. A temporary solution, discovered with the help of @hoxbro, involves removing the local version of intake and re-downloading it using anaconda-project run. This typically resolves the issue. However, restarting the kernel and running the notebook from the top down might bring back the Traceback error.

Cell [20] Error: The following code in Cell [20] generates a traceback error when the full data is not downloaded properly (as in problem 2):

partial_soil_data = df[df[soil_data_columns].notnull().any(1)]
partial_soil_data_sites = metadata[metadata.site.isin(partial_soil_data.site.unique())]

Traceback:

TypeError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 partial_soil_data = df[df[soil_data_columns].notnull().any(1)]
      2 partial_soil_data_sites = metadata[metadata.site.isin(partial_soil_data.site.unique())]

TypeError: DataFrame.any() takes 1 positional argument but 2 were given

Using any(axis=1) resolves this error. However, if problem 2 does not occur, this cell runs without the TypeError.

@maximlt @droumis

Azaya89 commented 1 month ago

I have completely re-wrote the notebook to remove all usage of intake.
The .csv files are downloaded locally via awscli by running anaconda-project run download_fluxnet_daily. This takes about a minute to download all the files and saves in the same folder as the .txt file.
Some of the cells are failing the test now and I don't know why. I will investigate that later.

Otherwise, I think this is ready for review now.

@hoxbro

hoxbro commented 3 weeks ago

I have pushed a fix that will make the test pass. I'm unsure why it doesn't work when you scatter the index.

The doc build is failing; @Azaya89, can you try and see if you can fix this?

maximlt commented 2 weeks ago

Arf @Azaya89 I see we're still having some issues. The error we encounter looks very similar to the one reported here https://github.com/aws/aws-cli/issues/8988. Digging more into this direction should hopefully give us a solution. This for instance looks promising https://github.com/aws/aws-cli/issues/5623#issuecomment-801240811, this too https://stackoverflow.com/questions/64992288/s3-sync-issue-running-in-azure-devops-pipeline-on-linux.

Azaya89 commented 2 weeks ago

Arf @Azaya89 I see we're still having some issues. The error we encounter looks very similar to the one reported here aws/aws-cli#8988. Digging more into this direction should hopefully give us a solution. This for instance looks promising aws/aws-cli#5623 (comment), this too https://stackoverflow.com/questions/64992288/s3-sync-issue-running-in-azure-devops-pipeline-on-linux.

Thank you. Let me try this out...

github-actions[bot] commented 2 weeks ago

Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.

Azaya89 commented 2 weeks ago

The doc build is failing; @Azaya89, can you try and see if you can fix this?

Fixed. I think it is ready for final review now @hoxbro

github-actions[bot] commented 2 weeks ago

Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.

github-actions[bot] commented 1 week ago

Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.

Azaya89 commented 1 week ago

Another run has replaced the dev docs site. I want to make sure you checked if everything looked good before it was replaced.

LGTM!

github-actions[bot] commented 1 week ago

Your changes were successfully integrated in the dev site, make sure to review the pages of the projects you touched before merging this PR.

holoviz-topics / examples

Modernize carbon flux #411

Modernizing an example checklist

Preliminary checks

Change ‘anaconda-project.yml’ to use the latest workable version of packages

Plot API updates (discussed on a per-example basis)

Interactivity API updates (discussed on a per-example basis)

Panel App updates (discussed on a per-example basis)

General code quality updates

Text content

Visual appearance - Example

Visual appearance - Gallery

Workflow (after you have made the changes above)

Bug Report on this example notebook: Inconsistency with the usage of `intake`

holoviz-topics / examples

Modernize carbon flux #411

Modernizing an example checklist

Preliminary checks

Change ‘anaconda-project.yml’ to use the latest workable version of packages

Plot API updates (discussed on a per-example basis)

Interactivity API updates (discussed on a per-example basis)

Panel App updates (discussed on a per-example basis)

General code quality updates

Text content

Visual appearance - Example

Visual appearance - Gallery

Workflow (after you have made the changes above)

Bug Report on this example notebook: Inconsistency with the usage of intake

Bug Report on this example notebook: Inconsistency with the usage of `intake`