mllam / neural-lam

Neural Weather Prediction for Limited Area Modeling
MIT License
64 stars 25 forks source link

Keeping a changelog and versioning #7

Closed leifdenby closed 1 month ago

leifdenby commented 5 months ago

I would like to propose that we start keeping a changelog. This would mean that from now on changes are only added through pull-requests, and every pull-request should add an entry to CHANGELOG.md (typically this addition will be done as a final commit once a pull-request has passed tests and review).

My notes below are based on https://keepachangelog.com/en/1.0.0/, please read it. It is a very succinct.

Why keep a changelog?

As the code keeps evolving it will be important to be able to communicate with each other how the codebase is changing. Having a single point of reference, where changes are grouped by feature additions, bugfixes, breaking changes and maintenance will make it a lot easier (and a lot more fun!) to work with the codebase.

For reference here is xarray's changelog: https://github.com/pydata/xarray/blob/main/doc/whats-new.rst, here is one for a different project I've worked on: https://github.com/cloudsci/cloudmetrics/blob/master/CHANGELOG.md

Connected to this I would also like to introduce versioning, using semantic versioning. I would to tag the commit https://github.com/joeloskarsson/neural-lam/commit/2378ed7eddf8da5bfec6f57c41cadf310d191dee @joeloskarsson as v0.1.0, and create a changelog relative to there. My reasoning for this is that I think that commit was the most recent commit when you shared this repository publicly. We can the start the changelog by including the commit that you made for recent loss-function additions (new feature) and the maintenance @sadamov is doing in #6 by setting up linting.

In that case the CHANGELOG would look something like:

# Changelog

## [Unreleased](https://github.com/joeloskarsson/neural-lam/HEAD)

[Full Changelog](https://github.com/joeloskarsson/neural-lam/compare/v0.1.0...HEAD)

*new features*

- additional loss-functions 
  [2378ed7](https://github.com/joeloskarsson/neural-lam/commit/2378ed7eddf8da5bfec6f57c41cadf310d191dee)](https://github.com/joeloskarsson/neural-lam/commit/c14b6b4323e6b56f1f18632b6ca8b0d65c3ce36a) By Joel Oskarsson
  (@joeloskarsson )

*maintenance*

- set up linting with pyflake8, black with pre-commit with github action to run in cicd [\#1](https://github.com/joeloskarsson/neural-lam/pull/6/) by Simon Adamov (@sadamov)

## [v0.1.0](https://github.com/joeloskarsson/neural-lam/HEAD)

First public release of Neural-LAM, including functionality to train hierarchical and ...

Let me know your thoughts. I would be happy to set this up if you are happy with it. But I just wanted to get the discussion going before the codebase starts changing a lot

joeloskarsson commented 5 months ago

I think this all sounds good. My initial reaction was "Why do we need a changelog if we have a neat git history", but after reading through the links I agree that it would be useful. Perhaps one should also use github releases and keep those synced to the changelog? (Can that be done automatically?)

I think I'd prefer to not have names associated with the items in the changelog. You can always see who is responsible in the git links, if you want to contact someone responsible. So writing the names seems like just extra admin + gives the idea that a contribution is the result of only one person, which is often not the case. Let me know if there is some good reason to have this that I don't think about.

Semver is good. I agree that starting from https://github.com/joeloskarsson/neural-lam/commit/2378ed7eddf8da5bfec6f57c41cadf310d191dee is best, that is also the commit that I tagged for reproducing our workshop paper (ccai_paper_2023-branch, but that one is probably not needed with proper versioning). A question regarding versioning: How often should one release a new version then? Almost every commit could be a new minor- or patch-version, but that is not very useful. I guess you just collect a reasonable amount of new things, but are there some guidelines around this that we want to follow?

leifdenby commented 4 months ago

Perhaps one should also use github releases and keep those synced to the changelog? (Can that be done automatically?)

Yes, github releases are a good idea I think. In a way it is a second step after tagging that can be used to indicate that this tagged version is the one to use in downstream applications going forward. I have often set it up so a github release automatically gets uploaded to pypi.org. I could do that for neural-lam too. That would require the code to be refactored into a package though, so maybe we should wait with that until we've discussed refactoring.

Regarding automatically creating changelogs: In principle that is possible, but I think it defeats the point of a changelog. The point of a changelog is that it is "for humans". As in, the purpose is for the person who made a code modification to reflect on how this might affect other people working with the code. And through this formulate a single sentence on what the change is for.

I think I'd prefer to not have names associated with the items in the changelog. You can always see who is responsible in the git links, if you want to contact someone responsible. So writing the names seems like just extra admin + gives the idea that a contribution is the result of only one person, which is often not the case. Let me know if there is some good reason to have this that I don't think about.

I don't feel strongly about this, but I wouldn't underestimate the sense of achievement it gives to be able to add yourself to a list of contributors to a project. I think that is the reason why xarray do it (https://github.com/pydata/xarray/blob/main/doc/whats-new.rst). We can make it optional if you like, but I think it is nice for people to feel like they have joined something. Yes, true, this is visible in the git log too, but so are the code changes. Again, I think this about the human element of code development. I don't think optionally adding a few tens of characters is this much work compared to actually working through a pull-request :)

I agree that starting from 2378ed7 is best, that is also the commit that I tagged for reproducing our workshop paper

Ok, great! Will you tag that one as v0.1.0 in that case?

A question regarding versioning: How often should one release a new version then? Almost every commit could be a new minor- or patch-version, but that is not very useful. I guess you just collect a reasonable amount of new things, but are there some guidelines around this that we want to follow?

I think the point here is not that semver defines when we should make versions, but rather communicates what changes between different versions. So we should decide on how often we'd like to make a new version. I think a nice model could be to briefly discuss code changes after each monthly meetup. If we'd like everyone in the community and newcomers who stumble across the code to update to the most recent tagged version then we tag a new version. I would prefer to version often based on when we as a group feel like we've made progress.

That was a quite a lot :)

joeloskarsson commented 4 months ago

Great, can you set up a PR with the changelog file then? Adding in an entry about the pre-commit hooks and resulting reformatting.

joeloskarsson commented 4 months ago

I added the tag: https://github.com/joeloskarsson/neural-lam/tree/v0.1.0

joeloskarsson commented 1 month ago

Merged in #28 , closing