pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/
94 stars 36 forks source link

Xclim : Xarray-based climate data analytics #73

Closed Zeitsperre closed 1 year ago

Zeitsperre commented 1 year ago

Submitting Author: Trevor James Smith (@Zeitsperre) All current maintainers: (@Zeitsperre, @tlogan2000, @aulemahal) Package Name: xclim One-Line Description of Package: Climate indices computation package based on Xarray Repository Link:  https://github.com/Ouranosinc/xclim Version submitted: v0.40.0 Editor: @Batalex Reviewer 1: @jmunroe Reviewer 2: @aguspesce Archive: DOI JOSS DOI: DOI Version accepted: v0.42.0 Date accepted (month/day/year): 04/11/2023


Description

xclim is an operational Python library for climate services, providing numerous climate-related indicator tools with an extensible framework for constructing custom climate indicators, statistical downscaling and bias adjustment of climate model simulations, as well as climate model ensemble analysis tools.

xclim is built using xarray_ and can seamlessly benefit from the parallelization handling provided by dask. Its objective is to make it as simple as possible for users to perform typical climate services data treatment workflows. Leveraging xarray and dask, users can easily bias-adjust climate simulations over large spatial domains or compute indices from large climate datasets.

Scope 

Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

  - Who is the target audience, and what are scientific applications of this package?

xclim aims to position itself as a climate services tool for any researchers interested in using Climate and Forecast Conventions compliant datasets to perform climate analyses. This tool is optimized for working with Big Data in the climate science domain and can function as an independent library for one-off analyses in Jupyter notebooks or as a backend engine for performing climate data analyses over PyWPS (e.g. Finch). It was primarily developed targeting earth and environmental science audiences and researchers, originally for calculating climate indicators for the Canadian government web service ClimateData.ca.

The primary domains that xclim is built for are in calculating climate indicators, performing statistical correction / bias adjustment of climate model output variables/simulations, and in performing climate model simulation ensemble statistics.

  - Are there other Python packages that accomplish the same thing? If so, how does yours differ?

icclim is another library for the computation of climate indices. Starting with version 5.0 of icclim, some of the core computations rely on xclim. See explanations about differences between xclim and icclim.

scikit-downscale is a library offering algorithms for statistical downscaling. xclim drew inspiration for its fit-predict architecture. The suite of downscaling algorithms offered differ.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box.  This package:

Publication options

 JOSS Checks   - [x] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [x] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor 'utility' packages, including 'thin' API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [x] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [x] The package is deposited in a long-term repository with the DOI: https://doi.org/10.5281/zenodo.2795043 *Note: Do not submit your package separately to JOSS*   

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

Please fill out our survey

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

Zeitsperre commented 1 year ago

To whom it may concern, our paper is available on a separate branch within xclim. The Pull Request containing the paper can be found here: https://github.com/Ouranosinc/xclim/pull/250

lwasser commented 1 year ago

hi @Zeitsperre welcome to pyOpenSci!! are you a part of the pangeo community by chance? i think i saw xclim discussed in a few pangeo threads.

Zeitsperre commented 1 year ago

Hi @lwasser, nice to meet you!

We are not formally part of the Pangeo community (we do not receive compensation from them), however xclim is mentioned in their list of xarray-based projects (https://docs.xarray.dev/en/latest/ecosystem.html#ecosystem). A handful of xclim's developers have also made significant contributions to packages in the Pangeo ecosystem as well (xarray, xesmf, flox, cf-xarray, etc.) and some key developers of xarray have contributed to xclim, through issue raising or code contributions. I think it's safe to say that the objectives of xclim are in line with those of Pangeo.

lwasser commented 1 year ago

awesome! nice to meet you as well @Zeitsperre ! I asked because we are working on a partnership with pangeo to curate that list via peer review. as a part of that curation, we have a small set of checks that are specific to the pangeo community including:

So we'd want to include you in that given your are in the list of tools. i haven't done a deep dive into your package but these are universal pangeo standards.If the review goes smoothly, we'd then tag your package as being pangeo community related / vetted.

do this package adhere to the above pangeo standards? Many thanks - i'm just trying to figure out the best path here to support our partnership :) i just started working on this and have an open pr here to update our review submission template!

Zeitsperre commented 1 year ago

I believe we tick all the boxes you listed:

I know that the core developers of xclim would gladly welcome feedback on how to better organize/structure our code base, and xclim being considered high enough in quality to be formally Pangeo-endorsed would be great!

Zeitsperre commented 1 year ago

Just to add some clarification, I just remembered that we have some functionality to build indicator catalogues using YAML configurations (https://xclim.readthedocs.io/en/latest/notebooks/extendxclim.html#YAML-file).

We also built a translation engine for providing multi-language climate indicator metadata based on JSON descriptions (currently, we have translations to French, but any other languages are supported; https://xclim.readthedocs.io/en/latest/internationalization.html).

These files are loaded on import or can be supplied by the user explicitly and are solely used for configuration. Hope this helps!

lwasser commented 1 year ago

perfect! Thank you @Zeitsperre for the speedy replies! I am going to followup and also get our editor in chief involved @NickleDave I stepped in only because i'm actively working with pangeo now so i just wanted to check in to see if this review could be supported via that collaboration as well. And it does sound like it could. more soon!

NickleDave commented 1 year ago

Welcome @Zeitsperre!

At first glance looks like you have got most everything in line with our initial checks.

I will add the completed checks by end of day tomorrow.

NickleDave commented 1 year ago

Hi again @Zeitsperre -- Happy to report that xclim passed initial editor-in-chief checks (below) with flying colors.

I have one suggestion, which we strongly recommend, but do not require:

We expect to know how we will go about selecting editors by the end of day Monday (Jan. 23rd). @lwasser needs to talk a little more with people about the Pangeo collaboration.
We will report back to you then with a timeline for the review.

Editor in Chief Checks

Please check our Python packaging guide for more information on the elements below.



Editor comments

Zeitsperre commented 1 year ago

Thanks, @NickleDave! I'm glad to see that all the hours spent reading PEPs and adopting good practices has been helpful here!

I'll be opening a PR later today to address the PEP 517 and 621 compliance comment. We don't do anything too fancy in our setup.py, so I don't imagine it should be difficult to adopt pyproject.toml. I'll link that here once it's underway.

Zeitsperre commented 1 year ago

@NickleDave

I just merged changes to address some of your comments, primarily for pyproject.toml and the ReadMe suggestion. I opted to adopt flit as it seemed to be a very stable and no-nonsense setup engine. It took a bit of time to migrate some configurations around, but it looks/works much better.

Agreed on your point about Basic Usage. I'll be opening a ticket to address that down-the-line. There are a fair amount of Pull Requests currently open, but we tend to address things in a reasonable amount of time.

Let me know if you have any other suggestions, thanks!

NickleDave commented 1 year ago

Hi @Zeitsperre!

I just merged changes to address some of your comments

Awesome! Glad to hear it.

Agreed on your point about Basic Usage.

Thank you for even making note of that. It was just a nitpick and really more of a list to give an editor things they might want to revisit. Your docs look really great over all and I'm sure they'll look better after the review.

I'm sorry for not getting back to you by the end of the day yesterday--totally my fault.

I want to let you know we do have an editor, @Batalex! 🎉🎉🎉 who will introduce themselves and officially start the review in the next couple of days. We are actively recruiting reviewers right now as well.

Batalex commented 1 year ago

Hi @Zeitsperre, I am pleased to meet you! I will be the editor for this review, and I will get up to speed during the next few days with the pre-submission discussions, the editorial process and the package itself before recruiting reviewers. I appreciate your patience; I am looking forward to working with you 🫡

Zeitsperre commented 1 year ago

Hello @Batalex, nice to meet you as well!

That all sounds great! I'm looking forward to hearing how we can improve xclim. Feel free to ping me if you have any questions or see anything obvious that could be improved in advance of the formal peer-review process.

Thanks for your time and effort towards this!

Batalex commented 1 year ago

:wave: Hi @jmunroe and @aguspesce! Thank you for volunteering to review Xclim for pyOpenSci!

Meet @Zeitsperre, our lovely submission author, and feel free to introduce yourselves 🥰

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all the steps and information needed to complete your review.
  2. You will need the review template to fill out and submit in this issue as a comment once your review is complete.

Please get in touch with any questions or concerns! I just wanted to let you know that your review is due: Feb 26th.

Also, I'd like to point out that the next PR should be part of your review: https://github.com/Ouranosinc/xclim/pull/250.

PS: I'm sorry for the information dump. Please make sure that you review the submitted version. However, some issues you could point out might have already been solved in the meantime, but that's ok.

Zeitsperre commented 1 year ago

Hey all,

I just wanted to keep people in the loop, but our next version (v0.41) is slated for release on February 24, 2023. Assuming that the degree of changes suggested/recommended is reasonable to address, I feel that the xclim devs can likely prioritize adjustments for the subsequent release (v0.42).

One question that we had concerning the JOSS paper was if it is required that the paper branch be pushed to the main development branch, or if it can be hosted solely in its own branch? What is typically performed concerning the JOSS requirements?

NickleDave commented 1 year ago

https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements

Your paper (paper.md and BibTeX files, plus any figures) must be hosted in a Git-based repository together with your software (although they may be in a short-lived branch which is never merged with the default).

aguspesce commented 1 year ago

Hello @Zeitsperre, You need to put the paper in the main branch, as @NickleDave told you.

NickleDave commented 1 year ago

I might be reading their docs wrong, but I understood it to say that the paper files can be on some other branch that never gets merged with main.

See for example this published paper that has the files in a joss-rev branch: https://github.com/parmoo/parmoo/commits/joss-rev

Zeitsperre commented 1 year ago

I might be reading their docs wrong, but I understood it to say that the paper files can be on some other branch that never gets merged with main.

That's what I read as well, but I wanted to confirm. In any case, we can cross that bridge when we get there. The paper is still up-to-date with the current aims and structure of the package, but I would like to add a few comments and clarifications here and there; I anticipate that we'll be modifying it some more in the coming weeks.

aguspesce commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 6


Review Comments

Thank you for the opportunity to review this code. The package looks very good and a lot of effort has been put into its development. I only have minor suggestions and observations, so I will enumerate them in order of priority.

About the documentation

I found some issues on the documentation website that don't look well.

About the repository

The JOSS paper

The code

About the code, I don't have any objection. I have only noted that the code is sometimes difficult to follow. So, I just have a few observations that might help to improve this in future developments.

Zeitsperre commented 1 year ago

Hi @jmunroe

We're planning to address all comments from the pyOpenSci review process for the next release (around 1~1.5 months away). There's no rush, but I just want to confirm that nothing in @aguspesce's comments runs counter to your preliminary evaluation/thoughts. Thanks!

jmunroe commented 1 year ago

I don't see anything in that review that I would disagree with. I'm trying to get my own review will be submitted within the week.

Batalex commented 1 year ago

Thanks, @aguspesce, for your effort!

@Zeitsperre I was thinking of reviewing Xclim a bit after our reviewers. The more, the merrier 😄

Zeitsperre commented 1 year ago

Much appreciated!

Except for one or two more challenging fixes, we should be done addressing @aguspesce's comments in the next few days.

Zeitsperre commented 1 year ago

Hey there,

Just letting people know that the comments to date have all been addressed. The main development branch (and latest documentation) reflect these changes. Our next release is slated for the end of March, so if more comments arrive this week, we should be able to address them next week, in time for the release.

Thanks again!

Batalex commented 1 year ago

Hello @Zeitsperre, Thanks your responsiveness and communication on Xclim release schedule. I am following up with @jmunroe to give you the best estimate possible as to when you will have the second review.

I really appreciate your patience on this submission.

Batalex commented 1 year ago

Hey @Zeitsperre, Thanks again for your patience. Due to unforeseen personal issues, the second review was delayed but is still on the way. Cheers!

jmunroe commented 1 year ago

Thank so much for your patience with me. I have now gone through the package carefully following the PyOpenSci review instructions. This package is an incredibly thorough treatment of climate based metrics and I found it a real pleasure to review. A few issue I identified are presented below but none of the them are serious and I recommend this package be accepted by PyOpenSci.

I hope that in acceptance by PyOpenSci (and hopefully by JOSS as well), xclim will continue to be extended and adopted by a wide international community of climate scientists and professionals. This is an important body of work that should empower those trying to understand, mitigate, and adapt to our changing climate.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

For the JOSS paper information, I am using the joss_submission tag from GitHub.

Final approval (post-review)

Estimated hours spent reviewing: 8 hours


Review Comments

Installation notes

The reviewer instructions appears leave it up to the reviewer on how to install the package. I cloned the repo:

git clone https://github.com/Ouranosinc/xclim.git

And then, to ensure I was looking at the same version as was submitted for review, I switched to the v0.40.0 tag:

git checkout tags/v0.40.0

Following the instructions at the xclim documentation under Installation, I created a separate conda environment to install the required dependencies:

conda create -n my_xclim_env python=3.8 --file=environment.yml
conda activate my_xclim_env
pip install ".[dev]"

And there I hit my first issue:

CondaValueError: could not parse 'name: xclim' in: environment.yml

The fix (at least for conda 22.11.1) is that --file is an option to pass to conda env create and not conda create. This needs to be fixed in the install instructions.

The instructions refer to 'Extra dependencies' such as flox, SBCK, eigen, eofs, pybind. Since I used conda's environment.yml the extras eofs, eigen, pybind11 were already included.

I confess I tend to get confused when there is the option of using either environment.yml and requirements_*.txt files. So, I skipped the instructions following 'Extra Dependencies' in the documentation. I assume there must be situtations when I should and should not install these extra dependencies but as a new user of the package, I don't what those situations are yet. Since theses installation instructions are right near the top of the documenation, perhaps it would be better for the maintainers to make those choices for me? For example, I am now wondering "should I be installing flox?". Since it is 'highly recommended', would it not make more sense to have it as part of the default instructions?

Basic Usage

To evaluate

I installed Jupyter lab and created a notebook to test the instructions given under the Basic Usage section of the documentation. The very first example

# ds = xr.open_dataset("your_file.nc")
ds = open_dataset("ERA5/daily_surface_cancities_1990-1993.nc")
ds.tas

My initial reading of this code made me think that this ERA5 dataset was something I need to first download locally (I did not distinguish between xr.open_dataset and open_dataset in my very first glance at the code). After some review, I see now that there companion GitHub repo that was available that had testing data and the xclim.testing API automatically makes a locally cached copy of this file. I think it would be clearer if this very first example was written out as

# ds = xr.open_dataset("your_file.nc")
ds = xclim.testing.open_dataset("ERA5/daily_surface_cancities_1990-1993.nc")
ds.tas

so that it was clear that the open_dataset was utility method of the testing framework for xclim.

I liked how this initital documentation oriented the user quickly to the differences between 'indicators' and 'indices' which, especially for the new-user, may be something that could be confusing.

In the example of Health checks and metadata attributes there is a typo:

gdd = xclim.atmos.growing_degree_days(tas=ds6h.tas, thresh="10.0 degC", freq="MS")

should be

gdd = xclim.atmos.growing_degree_days(tas=ds6h.air, thresh="10.0 degC", freq="MS")

The final basic usage examples on Graphics are could be improved by adding some descriptions in the text of what those three visualizations are actually showing. This could be as brief as taking the comment from each code snippet and using that as a text description.
While in-code comments are generally fine, these last few examples on graphics feel tacked on given the strong narrative text established in the beginning of the Basic Usage section of the documentation.

Examples
Workflow Examples

Minor spelling error in the docs:

"Finally, xarray is tightly in(t)egrated with dask, a package that can automatically parallelize operations.

Under Subsetting and selecting data with xarray, the documentation reads:

Usually, xclim users are encouraged to use the subsetting utilities of the clisops package. Here, we will reduce the size of our data using the methods implemented in xarray

This is confusing because, as the first example workflow, the user has not yet been shown to use the clisops package. Should there be a subsub-section immediately before such as Subsetting and selecting data with cliops to demonstrate that recommended workflow?

Under Different ways of resampling there is setting of a matplotlib style

# import plotting stuff
import matplotlib.pyplot as plt

%matplotlib inline
plt.style.use("seaborn")
plt.rcParams["figure.figsize"] = (11, 5)

that leads to the warning

/tmp/ipykernel_7039/887583071.py:5: MatplotlibDeprecationWarning: The seaborn styles shipped by Matplotlib are deprecated since 3.6, as they no longer correspond to the styles shipped by seaborn. However, they will remain available as 'seaborn-v0_8-<style>'. Alternatively, directly use the seaborn API instead.
  plt.style.use("seaborn")

I think the offending line should be changed to

plt.sytle.use("seaborn-v0_8")

(and elsewhere in the documenation where seaborn styles are used)

Since the example is supposed to contrast the differences between the two sampling methods, I think they should be on the same colorbar scale. There may be a more general solution for this with matplotlib, but explicitly setting the range is a quick fix:

hw_before.sel(time="2010-07-01").plot(vmin=0, vmax=7)
plt.title("Resample, then run length")
plt.figure()
hw_after.sel(time="2010-07-01").plot(vmin=0, vmax=7)
plt.title("Run length, then resample")

Under Spatially varying thresholds, the code comments showing the thresholds do not

# The tasmin threshold is 15°C for the northern half of the domain and 20°C for the southern half.
# (notice that the lat coordinate is in decreasing order : from north to south)
thresh_tasmin = xr.DataArray(
    [7] * 24 + [11] * 24, dims=("lat",), coords={"lat": ds5.lat}, attrs={"units": "°C"}
)
# The tasmax threshold is 16°C for the western half of the domain and 19°C for the eastern half.
thresh_tasmax = xr.DataArray(
    [17] * 24 + [21] * 24, dims=("lon",), coords={"lon": ds5.lon}, attrs={"units": "°C"}
)

don't appear to match the values used in the code. I assume the code comments just need to be updated.

Ensemble-Reductinon Techniques

Only because the utility of the rest of the package is so high, I was a bit confused with the need to manually create a 2D array of criteria (values) and realizations (runs/simulations). While I think I understand why all dimensions/variables, other the realization id, need to be flattened into a 2D array to apply an ensemble reduction technique, would it be possible to create a helper function that interates through all non-realization 'dimensions', and all variables to generalize the code snippet given below?

# Create 2d xr.DataArray containing criteria values
crit = None
for h in ds_crit.horizon:
    for v in ds_crit.data_vars:
        if crit is None:
            crit = ds_crit[v].sel(horizon=h)
        else:
            crit = xr.concat((crit, ds_crit[v].sel(horizon=h)), dim="criteria")
crit.name = "criteria"

Is this "criteria" array effectively the equivalent of creating a feature matrix used in data science?

Ensemble-Reductinon Techniques

Since, at the moment, "all frequency analysis functions are hard-coded to operate along the time dimension", perhaps a brief reference to xr.DataArray.rename on how to rename a dimension in xarray would be useful?

I really liked how the worked example for xclim.indices.stats.frequency_analysis show both how to use it and fully explains the steps that are bundled together in that convenience function.

Extending xclim

This level of "developer documentation" is well explained in the "user manual" section of the examples documentation clearly shows that xclim has been written in a away that is extensible. My previous experience with some other packages that are designed to be extended or have some sort of "plug-in" architecture, the documentation of how to actually use that extensibility is buried in the internal developer-focused documentation (if it exists at all). The authors of xclim should be commended for their clarity of presentation here that will allow others to extend the indicator/indice framework introduced by xclim.

Statistical Downscaling and Bias-Adjustment

The content is clear and well explained. Just a couple of very small typos that could be fixed:

A more complex example could have bias distribution varying strongly across months. To perform the adjustment with different factors for each months, one can pass group='time.month'. Moreover, to reduce the risk of sharp change in the adjustment at the interface of the months, interp='linear' can be passed to adjust and the adjustment factors will be interpolated linearly. Ex: the factors for the 1st of May will be the average of those for April and those for May.

and in the following notebook on advanced SDBA:

The previous notebook covered the most common utilities of xclim.sdba for conventional cases

Again, kudos for the depth and richness of examples presented here.

Spatial Analogues examples

This is really neat to see this functionality so accessible.Soon after completing this review, I definitely have to run this kind of computation on places that I have personal connections with. And with xclim, I think that will be straightforward to do!

Zeitsperre commented 1 year ago

Hello @Batalex and @jmunroe

I hope you both are well! We managed to hammer out as many of the smaller review comments as we could, and have marked the comments concerning the behaviour of frequency_analysis as something that necessitates some deliberation in the coming months.

This has all been merged to our main development branch in time for our latest release (version 0.42.0)

If you could please confirm that these changes meet your expectations, we would greatly appreciate it. Please let us know if there are any other modifications required.

Best,

Batalex commented 1 year ago

Hello @Zeitsperre, I am impressed by Xclim's reactivity!

In addition to Agustina and James' remarks, I'd like to add a little comment. We recommend separating unit tests from source code or, at least, excluding them from the final build.

As for the review, @aguspesce, @jmunroe, could you please make a final check as to whether you are satisfied with how your comments were dealt with? You can go through the changelog, the Xclim's team was nice enough to highlight the changes brought by this submission.

Once you are done, you may check the The author has responded to my review and made changes to my satisfaction. I recommend approving this package. item in your review.

Zeitsperre commented 1 year ago

In addition to Agustina and James' remarks, I'd like to add a little comment. We recommend separating unit tests from source code or, at least, excluding them from the final build.

Addressing this comment would be a bit tricky; As a second layer of caution, we currently run tests on the sdist when building our binary packages on conda-forge. Previously, using setuptools, we could more easily exclude the xclim/testing/tests folder when building the wheel, but it seems like the new backend we are using (flit) doesn't easily allow that level of control.

I'll open an issue at https://github.com/pypa/flit to see if that feature is maybe just not documented (i.e. Optional exclusion of files/folders depending on build distribution (sdist |bdist_wheel)). If this is blocking acceptance to pyOpenSci, I can try getting this done for the next release.

Thanks again!

lwasser commented 1 year ago

Hi everyone! 👋 this has been a wonderful review to watch unfold!

A note about tests - for reference, here is our language around testing in our packaging guide.

For many scientific packages, it is important to have tests included in distribution archives. This is particularly important if you have complex wheel builds supporting multiple architectures, etc.. In this case it sounds like it's important to the conda-forge build. While we suggest that unit tests live outside of the package directory, we do not require it.

We suggest tests live outside of the package directory to prevent users from running tests on the package file tree rather than in the installed version. And also it makes it easy to exclude tests from distributions that do not need to have tests included.

i wanted to add this note only because it is NOT a hard requirement for pyOpenSci acceptance. We do understand that in some cases tests may need to be a part of the distribution and moving them might be significant burden to the maintainers.

it would be interesting to see if there is a bit more flexibility with hatchling or pdm-build as a backend. i'm not 100% sure as i haven't tried to exclude specific parts of the package dir with them. And, I don't want to cause anyone here more / unnecessary work! So if this request is a pain point for xclim, then it is ok to move forward without this specific change.

Just for information gathering purposes, i am going to ask the hatchling dev about that specific feature as well. :)

lwasser commented 1 year ago

in case this is helpful - i asked the hatchling dev about excluding a directory here feel free to chime in IF you have questions or need additional support in making this switch (if you decide to make it).

Batalex commented 1 year ago

Do not worry, this is but a minor comment. This is definitely not blocking the submission, especially if it means reworking the build system to remove 500kB of the sdist

Zeitsperre commented 1 year ago

@lwasser

Thanks for looking into things with the hatch backend. Ultimately, I really like the simplicity of flit, so I've gone with the approach of splitting the tests folder out of the package. After factoring the functions that were mixed into the tests themselves back into the package, the wheel is significantly smaller (~140 KB!).

Assuming that changes are accepted, this will be part of the next scheduled release (v0.43.0) in around a month.

Thanks again!

lwasser commented 1 year ago

ok thank you for making that structure change @Zeitsperre !! i totally get it. a lot of folks love the simplicity of flit. it's good for us to know what each tool can and can't do so that is why i thought i'd dig just a bit to understand more :)

and yay for a smaller wheel! PyPI / warehouse maintainers will appreciate that! ✨

Batalex commented 1 year ago

Many thanks, @Zeitsperre, I am once more impressed with the team 😍

jmunroe commented 1 year ago

I've reviewed the changes made by the authors.

Final approval (post-review)

Batalex commented 1 year ago

Thanks James, how about you @aguspesce? Are you satisfied with the changes the team made?

aguspesce commented 1 year ago

Sorry @Batalex for the delay! I am happy with the changes and have added the tick to my review.

Batalex commented 1 year ago

🎉 Xclim has been approved by pyOpenSci! Thank you @Zeitsperre for submitting Xclim and many thanks to @aguspesce and @jmunroe for reviewing this package! 😸

There are a few things left to do to wrap up this submission:

It looks like you would like to submit this package to JOSS. Here are the next steps:

🎉 Congratulations! You are now published with both JOSS and pyOpenSci! 🎉

All -- if you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the contributing-guide. We have also been updating our documentation to improve the process, so all feedback is appreciated!

Batalex commented 1 year ago

On a more personal note, I was delighted to be part of this process and learned a lot as an editor. I wish I had more time to give a more in-depth review, but I could not be happier with the quality of Agustina and James' work.

I have one final task for you @aguspesce, @jmunroe. We would appreciate it if you could take the post-review survey to improve our process. I will send you the link by DM. Thank you all 🤗

tlogan2000 commented 1 year ago

This is excellent news! Thanks to all the reviewers and to @Zeitsperre for preparing and coordinating the xclim submission. Great job.

lwasser commented 1 year ago

wow!! so awesome!! congrats @Zeitsperre, @tlogan2000, @aulemahal !! we also welcome any feedback from anyone here on this process!

@Batalex incredible work leading this review!! 🙌 and many thanks @aguspesce @jmunroe for the excellent reviews!

NOTE: i also posted about this on the pangeo discourse!

Zeitsperre commented 1 year ago

This is such fantastic news! Thanks so much to @lwasser, @NickleDave, @Batalex, @aguspesce, and @jmunroe for all of your effort and time guiding us to build a better package!

I was scheduled to deliver a brief internal presentation on news concerning xclim today, so I'll be sharing this news as well :)

We should be opening pull requests soon and adding all associated information and badges to xclim's README and documentation either today or tomorrow, and following up with JOSS by the end of the week. Again, thanks so much for all your work!

Batalex commented 1 year ago

@all-contributors please add @Zeitsperre for code please add @tlogan2000 for code please add @aulemahal for code please add @aguspesce for review please add @jmunroe for review

Batalex commented 1 year ago

@Zeitsperre, @tlogan2000, @aulemahal We would love to have you on our Slack if you'd like!
Give me a shout, and we will do so

Zeitsperre commented 1 year ago

@Batalex Sure! Feel free to add me! (email(s) are in https://github.com/Ouranosinc/xclim/blob/master/AUTHORS.rst)

Batalex commented 1 year ago

@all-contributors please add @aulemahal for code please add @aguspesce for review please add @jmunroe for review

Plz, be a good bot so I can stop spamming people