iri-pycpt / pycpt

package joining pycpt stuff
3 stars 1 forks source link

PyCPT

PyCPT is a python interface to CPT, the IRI Climate Predictability Tool.

This README is directed at PyCPT developers. Users of PyCPT should see the homepage.

Package structure

PyCPT is made up of six conda packages: cpt-bin, cpt-core, cpt-dl, cpt-extras, cpt-io, and pycpt, plus a Jupyter Notebook that imports and uses these libraries.

cpt-bin contains the CPT executable. This is a platform-specific package that we build separately for Windows, Linux, and macOS, all on x86-64 processors. It works on Apple silicon via Rosetta. The other packages are pure python and thus platform-independent; we build one package and it works on all three platforms. (Earlier in the lifetime of this project, precompiled linux binaries were not provided, and installing PyCPT on Linux involved a time-consuming and unreliable compilation step. That was based on a misunderstanding; conda does in fact support building a single executable that runs on any linux system where conda is installed.)

Having cpt-bin as a separate package is useful because the python code is being developed more actively than CPT, and it is convenient to be able to publish changes to the python code without rebuilding three platform-specific packages. The original motivation for splitting the python code into five different packages was that we anticipated using some of the support libraries in other applications , e.g. in python maprooms, and we wanted to be able to do that without carrying over all of PyCPT's dependencies, including CPT. This motivation is no longer as strong as it was. For one thing, installing cpt-bin on linux is no longer as difficult as it was, as explained in the previous paragraph. Second, in practice code reuse has been rare so far. The administrative burden of coordinating changes between different packages arguably outweighs what little benefit we get from it. We should consider merging some or all of the python packages in the future.

Frozen conda environments

The original installation instructions for PyCPT 2 said simply to install the pycpt package, relying on conda to pull in all of pycpt's dependencies (packages that pycpt imports, and other packages that they import, etc.). This method is unreliable. Conda generally (subject to certain constraints) installs the version of each library that is most recent at the moment of installation. PyCPT currently depends on more than 300 libraries, all maintained on different schedules by different open source developers, so new versions of various dependencies are constantly appearing. Consequently, if you install by that method, you are likely to get a different set of packages next week than you would get today. Most of the time these updates are innocuous or beneficial, but occasionally one breaks PyCPT. To ensure that users get versions of all the packages that actually work together, instead of instructing them to install the latest version of everything, we specify exact version numbers for all 300+ libraries. This specification can be found in the .lock files that are included in the pycpt release, one per platform (linux, osx, windows). The versions listed in the lock file are by no means the only versions that will work; they are simply one combination that has been tested and is known to work.

The notebooks repository

In addition to the pycpt repositiory, there is a second repository called notebooks that contains example Jupyter Notebooks that demonstrate the functionality provided by the pycpt packages. Of particular note is the Operations directory, which contains

A "release" of PyCPT consists of a compatible set of the above files. Releases are published through the notebooks repository in GitHub, at https://github.com/iri-pycpt/notebooks/releases . Releases are currently identified by the version of the pycpt package they include. This numbering system can be inconvenient: in order to publish a change to a package like cptdl, we need to increment the version number of pycpt and publish a new pycpt package, even though the contents of that package are identical to those of the previous version. Merging packages as suggested in Package structure may resolve this.

I (Aaron) find the separation between the pycpt and notebooks repositories confusing. I think we should move the Operational subdir of notebooks into pycpt, and then publish subsequent releases from that repo instead.

Instructions for creating a release are given in the Publishing new versions section below.

Development setup

All of the PyCPT python packages support pip's development mode, which allows you to edit the python code and test it in place, without building and installing a new package each time. To create a conda environment for testing changes, start with the environment from the latest release, and then replace each of the pycpt packages with your editable copy. That is,

Then you can activate the development environment (conda activate pycpt-dev) and run PyCPT in it. When you make changes to the PyCPT code, restart the jupyter kernel to load the modified version.

Publishing changes

The steps for creating a new version of PyCPT can be summarized as follows:

We will now go into more detail on some of these steps.

Building a pure python package

After modifying any package other than cpt-bin, follow these instructions. (The process for cpt-bin is more complicated because that package contains Fortran code that must be compiled for each platform. Instructions for that are in the next section.)

Building cpt-bin

(To be written)

Updating environment lock files

To update the environment specifications to use newly published PyCPT packages, it usually suffices to edit the lock files by hand and update the version numbers for those packages, e.g. change

https://conda.anaconda.org/t/ir-777bcf3a-3147-44d2-9fa2-dccca9b8d3ed/iri-nextgen/noarch/cptdl-1.1.2-py_0.tar.bz2

to

https://conda.anaconda.org/t/ir-777bcf3a-3147-44d2-9fa2-dccca9b8d3ed/iri-nextgen/noarch/cptdl-1.1.3-py_0.tar.bz2

If we need to update not only PyCPT packages but also one or more third-party dependencies, it is not a good idea to edit the lock files by hand, as the result may violate compatibility constraints between different packages. The simplest thing to do in this case is usually to

If this process results in the wrong versions of some packages being installed, or in an environment where PyCPT doesn't work, then we need to be more explicit about versions. (TODO go into more detail.)

When recreating the environment from scratch, this process must be repeated on each platform (Windows, macOS, Linux).

Creating a GitHub release

We used to instruct users to download files from the GitHub Code tab, but this had the following disadvantages:

To solve these problems, we switched to publishing via GitHub's "Releases" mechanism. To create a new release,