payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
18 stars 25 forks source link

Update packaging tools and logic #398

Closed aidanheerdegen closed 5 months ago

aidanheerdegen commented 6 months ago

payu deploys to pypi and conda, but the configuration metadata is stored in separate files:

https://github.com/payu-org/payu/blob/master/setup.py

https://github.com/payu-org/payu/blob/master/conda/meta.yaml

It seems this is a well known problem. Some solutions include:

https://github.com/basnijholt/unidep

https://stackoverflow.com/a/76722681

@dougiesquire suggested we could even utilise the PyPI build in conda packaging https://github.com/payu-org/payu/pull/362 but this left for later. Later has arrived.

The "modern" approach is to use a pyproject.toml file to specify dependencies

Related issue, the version is hard-coded:

https://github.com/payu-org/payu/blob/master/payu/__init__.py#L6

and used in a few locations

https://github.com/search?q=repo%3Apayu-org%2Fpayu%20__version__&type=code

so the pypi distribution only installs the version in that file. Hard-coding the version string in a file is a bad approach when using git tags for versioning, as it requires putting the version in the file, then tagging with the version in the file as a subsequent step. Nasty.

There are a number of options for "single-sourcing" the package version:

https://packaging.python.org/en/latest/guides/single-sourcing-package-version/#single-sourcing-the-version

But option 7 is the one that is most compatible with using git tags for versioning, and it uses setuptools_scm.

NB: A couple of the places the hard-coded version string is used are in some old install scripts that should be removed (see https://github.com/payu-org/payu/issues/396).

aidanheerdegen commented 6 months ago

Seems benchcab is having similar issues

https://github.com/CABLE-LSM/benchcab/issues/129

Would be good to have a consistent approach. Any opinions @SeanBryan51?

aidanheerdegen commented 6 months ago

Another option for the versioning is to use the hatchling build system and hatch-vcs, which conda itself has adopted to build conda

https://github.com/conda/conda/issues/12508

aidanheerdegen commented 6 months ago

This is an interesting read that explains some of the differences between packaging for PyPI and conda

https://pyopensci.discourse.group/t/suggestions-for-a-conda-user-preparing-a-package-for-pypi/324/2

dougiesquire commented 5 months ago

If changing packaging tools is in scope, this is a helpful resource comparing a number of modern ones (including Hatch).

Another option for the versioning is to use the hatchling build system and hatch-vcs, which conda itself has adopted to build conda

Maybe also worth mentioning versioneer which I've used in the past for this.

CodeGat commented 5 months ago

Unassigned myself from this for now, but will come back to it later!

dougiesquire commented 5 months ago

Sorry for being dense, but what is it exactly that you want to achieve here? Consolidate the setup.py and meta.yaml into a single file? I'm probably misunderstanding but I'm not sure either of your solutions help with this. As far as I can tell, a meta.yaml file will still be required to do the conda build.

@dougiesquire suggested we could even utilise the PyPI build in conda packaging https://github.com/payu-org/payu/pull/362 but this left for later. Later has arrived.

When I've done this for other projects, I've still needed both a meta.yaml and pyproject.toml (or equivalent) - e.g. see here.

Or is this idea to have the dependencies defined in one place and have themeta.yaml and pyproject.toml both populate from there? E.g. https://stackoverflow.com/questions/61485382/add-requirements-from-requirements-txt-to-conda-meta-yaml

dougiesquire commented 5 months ago

Regarding versioning, the obvious thing to use is setuptools_scm, as you mention. But I think there are issues dynamically setting the version in meta.yaml when using setuptools_scm (https://github.com/conda/conda-build/issues/4774). I've done this before with versioneer (e.g. see here), so maybe simplest to go with that if there's time pressure?

aidanheerdegen commented 5 months ago

Sorry for being dense, but what is it exactly that you want to achieve here?

No the fault is mine. The issue is not well formulated and adds in additional problems/information in an unhelpful way.

It became apparent that the conda and PyPI builds had diverged. The hard-coded version string was not being updated, which PyPI relied upon to determine a version to update, so it was stuck on the same version. Something similar is happening for the readthedocs deployment. Meanwhile the conda build was building and deploying fine.

So the initial motivation was to fix this issue, which seemed to imply some coalescing of the PyPI and conda builds/deployment.

Consolidate the setup.py and meta.yaml into a single file? I'm probably misunderstanding but I'm not sure either of your solutions help with this. As far as I can tell, a meta.yaml file will still be required to do the conda build.

Yes I was a bit naive and just assumed this was possible. So good to know.

So the goal should be to make as much as possible shared by PyPI (effectively pyproject.toml?) and conda (meta.yaml). At the very minimum this should be the versioning information.

@dougiesquire suggested we could even utilise the PyPI build in conda packaging #362 but this left for later. Later has arrived.

When I've done this for other projects, I've still needed both a meta.yaml and pyproject.toml (or equivalent) - e.g. see here.

Or is this idea to have the dependencies defined in one place and have themeta.yaml and pyproject.toml both populate from there? E.g. https://stackoverflow.com/questions/61485382/add-requirements-from-requirements-txt-to-conda-meta-yaml

It seems like a decent idea, but I'd prefer something standard and well understood, over something bespoke that might be more fragile or more difficult to maintain.

So number 1 problem to solve is consistent versioning between PyPI and conda builds based on repo tags, and must support dynamic updating of a version variable that can be queried by payu at runtime.

It seems @SeanBryan51 has opted for versioneer, so should we just do the same?

https://github.com/CABLE-LSM/benchcab/pull/229/files

Does this imply moving to pyproject.toml? Do we need to specify things like entry points twice in that case as we seem to be doing currently?

dougiesquire commented 5 months ago

It seems @SeanBryan51 has opted for versioneer, so should we just do the same?

That will be simplest/fastest if I'm the one doing it - see my comment above.

Does this imply moving to pyproject.toml? Do we need to specify things like entry points twice in that case as we seem to be doing currently?

It's not needed, but now seems like a good time to do it. I'm not sure what you mean by "specify things like entry points twice"?

So, to do:

Anything else?

aidanheerdegen commented 5 months ago

Re: Versioneer

That will be simplest/fastest if I'm the one doing it

If you have time to do it that would be fantastic. Thanks.

@jo-basevi is interested to learn more about python packaging so would be happy to assist, review code etc.

I'm not sure what you mean by "specify things like entry points twice"?

Entry points are currently defined in meta.yaml:

https://github.com/payu-org/payu/blob/master/conda/meta.yaml#L9-L17

and setup.py:

https://github.com/payu-org/payu/blob/master/setup.py#L59-L70C7

Which seems like useless (and potentially harmful) duplication.

Anything else?

See above. Is it possible for conda to pick up the entry point stuff from pyproject.toml? It didn't seem to from what I'd read, but happy to be corrected.

Also if it isn't possible I don't think it is worth delaying for. Entry points are changed relatively infrequently, so we can just live with it.

Specifying build dependencies in a single location is a "nice to have", but there are arguments that this is necessary, and it may indeed be advantageous to be able to separate them. Not sure it is applicable in this case. If we can't manage it, I'd settle for some CI that checks they're consistent.

dougiesquire commented 5 months ago

If you have time to do it that would be fantastic. Thanks.

Maybe not today, but probably tomorrow and definitely some time this week.

@jo-basevi is interested to learn more about python packaging so would be happy to assist, review code etc.

Great!

Entry points are currently defined in meta.yaml:

https://github.com/payu-org/payu/blob/master/conda/meta.yaml#L9-L17

and setup.py:

https://github.com/payu-org/payu/blob/master/setup.py#L59-L70C7

Ah right, of course. I think we can avoid this if we conda build from PyPI dist.

aidanheerdegen commented 5 months ago

Maybe not today, but probably tomorrow and definitely some time this week.

Awesome! Someone give this man a pay-rise.

aidanheerdegen commented 5 months ago

Damn. PyPI publishing failed:

https://github.com/payu-org/payu/actions/runs/7606639931/job/20712742002#step:4:15

Notice: Attempting to perform trusted publishing exchange to retrieve a temporary short-lived API token for authentication against https://upload.pypi.org/legacy/ due to __token__ username with no supplied password field
Warning:  It looks like there are no Python distribution packages to publish in the directory 'artifact/'. Please verify that they are in place should you face this problem.
Checking artifact/artifact: ERROR    InvalidDistribution: Unknown distribution format: 'artifact' 
aidanheerdegen commented 5 months ago

We're getting weird "dirty" tags in readthedocs even though "proper" tags exist:

Screenshot 2024-01-23 at 12 40 53 pm

From this issue I think the problem is that even though the

tag_prefix = `v`

was removed from pyproject.toml versioneer wasn't re-run to pick up that change:

https://github.com/payu-org/payu/blob/master/payu/_version.py#L54

I really should do a better job reviewing ...

dougiesquire commented 5 months ago

Rats. This issue just won't stay closed.

I think you're right. PR inbound.

dougiesquire commented 5 months ago

And stay closed!

aidanheerdegen commented 5 months ago

Can confirm, still closed.