What constitutes a "breaking change" in the ISIS codebase?

planetarysoftware / ISIS_TC

ISIS Technical Committee

https://github.com/USGS-Astrogeology/ISIS3

6 stars 11 forks source link

What constitutes a "breaking change" in the ISIS codebase? #140

Open rbeyer opened 3 years ago

rbeyer commented 3 years ago

There was recently some discussion in ISIS Issue 3882 about what constituted a "breaking change" that should trigger a major version tick.

The last record of discussion on this topic that the ISIS TC has is a statement of fact about the "current situation" in our meeting notes from 2020-02-11 in which was written: "Currently the API is both the library API and the application API."

In order to provide good guidance to ISIS developers and release managers, we should discuss what constitutes a breaking change and come to consensus.

rbeyer commented 3 years ago

I support the statement in the meeting notes. "The ISIS API" that should be considered when deciding what commits constitute a "breaking change" which would trigger a major version tick, are both the library API (the traditional definition), but also the application API. By the "application API" I consider this kind of definition: the best analogy to a library API is to consider each userland application as a "function." The command line arguments are analogous to function arguments, and the program output is analogous to a function's return signature. With those analogies, we can apply the traditional thought process that we apply to library APIs to determine what is and isn't a breaking change.

By that definition, the change to the output of hist in ISIS 4.3.0 (motived by ISIS Issue 3882) should have triggered a major version tick because the output of hist is a text file which contains a CSV component, and the former single DN column was replaced by two columns.

Similarly, the change to the output of stats in ISIS 4.1.1 (motived by ISIS Issue 3870)should have also triggered a major version tick because the output text file had a different structure (rather than omitting missing elements, the elements were provided with an "N/A" value).

These were both good changes, but they were not just features, they were breaking changes. Because if a user was then parsing the output of those text files, and they changed, then that would break user code. We know for a fact that both of these changes impact the HiRISE processing system, for example.

We must consider the "application API" when considering what is a "breaking change" because that is how the majority of our users interface and interact with ISIS.

Especially if we are trying to promote the idea that users should always prefer to upgrade to minor version upgrades, because we won't break their own workflows and potentially dependent code.

scsides commented 3 years ago

Food for thought. With the current sprint development cycle of the Astro devs, a new feature (app, parameter, overhaul) can take multiple years to mature. As a feature matures, breaking changes will likely be necessary. No amount of planning can account for all possibilities.

Scenario : A new app has reasonable functionality, it is known to NOT be complete, and it is added to the dev branch (the alternative would be to leave it in a feature branch for 1-3 years). As the app is used and reviewed it matures over the course of 1 - 3 years, the app name changes and/or parameter names change, and/or parameter defaults change.

Should there be a new major tick for each release that contains one or more of these changes? By current definition, YES.
Can features be included in releases, but flagged in some way that does not require them to be stable without a major tick?

rbeyer commented 3 years ago

Should there be a new major tick for each release that contains one or more of these changes? By current definition, YES.

Can features be included in releases, but flagged in some way that does not require them to be stable without a major tick?

Firstly, I want to be explicit and precise about the language: "features" can be breaking or non-breaking. Under SemVer, the introduction of non-breaking changes just requires a minor tick, but breaking changes require a major tick.

In the proposed scenario, the app and its growing feature set could be introduced and developed under minor ticks, but if breaking changes are introduced, those would either trigger a major tick or would need to be delayed until a major tick was decided upon.

To the second proposed question: I don't think so. At least not under the way that we are currently defining everything. If there is only one holy trunk line of development, then, when we "make a release" it has been blessed, and that literally defines stability, under the constraints of SemVer and our process.

We could certainly embrace a "stable" line of development and an "unstable" one, this may require releases on both lines, and we'd have to figure out a bug-fix policy, etc. And then we'd migrate things from "unstable" into "stable" when we're willing to stand behind something. Then the "stable" line satisfies what most (mission) users want/need, and the "unstable" allows for play amongst the userbase. However, that's a different model than the single "dev" branch that everything merges into right now.

If we didn't split into two lines, then we could try and tell our users that some things in a release are stable and some things aren't, which honestly has been the steady-state of ISIS for a long time, but then I'd argue that nothing has changed, and users are right back where they've always been, and can't easily determine what is and is not stable without a complete, encyclopedic understanding of the changelog (which is really an unreasonable burden on our users).

scsides commented 3 years ago

One could argue the CSV changes mentioned above where not breaking changes in the output if the column titles only had additions. A well written CSV exploitation tool should look for the titles to find needed columns and not count columns.

Ridiculous over the top case: There exists a web page with a table of information that is updated hourly. The exploitation tool dumps the web page into a text file and starts at byte 137 to read the table data. The web manager fixes a misspelling at character 62 (Th became The). The tool continues to use 137 instead of 138 thus reading the data incorrectly.

rbeyer commented 3 years ago

One could argue the CSV changes mentioned above where not breaking changes in the output if the column titles only had additions.

Agreed, that would be a non-breaking feature (a minor tick), but that's not how hist changed. The problem here is that the DN column was removed and replaced by two columns with completely different names (MinExclusive and MaxExclusive), which I consider to be a breaking change. Its not a bad change, but I think it is a breaking one.

A well written CSV exploitation tool should look for the titles to find needed columns and not count columns.

Completely agree about smart readers, and while a smart CSV reader would be able to furnish the new column names, if code was expecting to operate on a "DN" column, and it is now missing, that's a problem, no matter how well you read the CSV.

scsides commented 3 years ago

I disagree with the "nothing has changed", and am looking for a good solution, not necessarily the ones put forward, that doesn't cost unreasonable amounts and that doesn't frustrate all devs. If we explicitly identify a feature as "beta" in a release, which has not been the case in the past, then we are providing the necessary information and allowing far more users to review new capabilities. Alternatives: 1) Leave the feature in a branch that has to be maintained for years (=$$$), then release it and get comments back that cause major ticks anyway. 2) Have a second line for unstable work (= $$$). Not to be too pessimistic, but next to nobody will download it and test it. Might as well leave it in a branch. 3) Don't artificially limit multiple major ticks per year, and use the LTS version until a feature in another is needed.

AndrewAnnex commented 3 years ago

I don't totally follow the unstable work branch idea or the idea of leaving things in branches for long periods. As I understand it, given the LTS release schedule or even with a limit of numbers of major version ticks per year, as long as you keep track of what is a breaking change or not you would be able to "batch" up your breaking changes with each scheduled LTS/major tick release.

I think it is probably correct that these changes Ross mentions are breaking changes, but one does not need to cut a new major release for each breaking change in the dev branch, as you just wait until the next scheduled major release. Obviously you would have to be careful about not merging in a breaking change if you are planning to cut a minor/patch release.

for the long term-feature development issue stuart is mentioning, you would not need to keep a branch unmerged from the main/dev branches, but you would need to rebase onto newer releases if you missed one or merge partial progress before each major tick to have the breaking changes for that application bundled together with other breaking changes/the release schedule

scsides commented 3 years ago

Yes those were breaking changes. The point I should have explicitly made is that just because a pipeline/script breaks doesn't mean there was a breaking change.

It is the keeping track of that can be a problem, and even if it works it is a time sink, but then isn't everything. I mostly threw this out to get the discussion going and possibly find a better solution. We are starting with SemVer, but extending it to include the applications, parameters, and possibly their output. Non of which is unreasonable.

Definition of unstable for this discussion, usable but unfinished might be a better description: An app or class that is tested and working properly with its current capabilities. There are plans to add new capabilities over a long period (years) that will likely include parameter changes, logic changes that alter the output, or even a rename of the app because user ABC came up with a better name. Code that is known to be broken does not fit this category, nor should it be in a release.

Is there a better way to provide EASY access to unstable features in our extended definition of SemVer without waiting for a major tick? There are currently no more major ticks scheduled for this year, so a new feature can be included in 5.1, but a breaking fix can not be released until next year when 6.0 is ready. The tracking, rebasing, cherry picking, over long periods is error prone, and the unstable/stable versions include all of these plus the overhead of two versions.

For example, what if a new unstable app's documentation was on a flaming background with a giant disclaimer warning of its volatility. This would rely on RTM.

michaelaye commented 3 years ago

I have no opinion on the major version and API/outcome breaks, as to some extent it becomes philosophical and one must decide to either disturb the users or the devs more.

But here's a working tool/philosophy that might help to master things: gitflow

https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow

In short, strict separation between, dev, master, feature, and release branches (and even bugfix branches, if required). In this way, a long tracking of next year's API breaking change could be happening in a feature branch, and is less error prone b/c either its allowed feature set is frozen, or useful things that were created in the dev branch are carefully merged at rare occasions.

AndrewAnnex commented 3 years ago

so, a unstable feature as described here would probably best exist in a feature branch. Keeping this up-to-date is certainly a challenge and there is no easy answer for keeping unmerged feature branches working for years at a time if the main development branch is changing a lot. Any changes to that program (renaming it, changing outputs, etc) would be isolated to that feature branch thus not requiring major ticks in the branch.

At some point though "payment is due", but you can make your life easier by setting up bots to monitor feature branches for merge conflicts with notifications/automated periodic rebases/etc, which is a tooling question which also requires developer time to manage. Personally, I would rather get a notification to resolve a small conflict every so often than have to solve a massive merge conflict once down the road.

However, providing easy access (compiled) versions of that feature branch is a separate question entirely. Technically, the feature branch might be enough if you expect users to be able to compile ISIS for themselves. If you do not, then you would be tied to the major release schedule as you say.

For this case ISIS would need to move more towards a continuous delivery model (https://en.wikipedia.org/wiki/Continuous_delivery) to make access to these feature branches easier and free from the release schedule.

In practice this really just means that the tooling around building/testing/releasing ISIS needs to be further automated to allow you to make releases effortless for specific feature branches and specific commit hashes on feature branches. The good news is that the changes needed to switch to this model would also obviously help for regularly scheduled releases.

However my limited understanding of the ISIS release process is that it involves a lot of manual checks/people hours that would not be compatible with this model. I do not know how many of those aspects are intentional/operational requirements vs technical limitations in existing CI/CD infrastructure and tests.

scsides commented 3 years ago

Some software groups have the resources to follow the strict idealized git work flow. How many of the mission/instrument teams or individual scientists have expressed concern over testing a new version? Given that, how many community members currently have the time, inclination, and ability to pull down a dev or feature branch, build it and test it?

The current setup isn't very far from the document pointed to and is actually derived directly from it, but due to all of our situations it delays everyones ability to see and review useful changes in a reasonable time frame.

@AndrewAnnex 's point about CD addresses some aspects and has been a desire, actually goal, for sometime now. The tracking component of what goes in a release and what does not is still there. Some formal process changes could help alleviate some of this pain by requiring merges into dev to be followed by timely merges of non-breaking commits into maintained release branches.

tmwilliams9999 commented 3 years ago

@michaelaye, "one must decide to either disturb the users or the devs more". I would personally dislike working in such an environment, on either side. Both must make reasonable efforts to do their respective jobs and see the other's point of view. Both often fail. It is a partnership that can flourish or go down in smoke.

The inclusion of incomplete features in formally released software is unconventional, but not unheard of. As @scsides indicates, the documentation would show that the feature will change, and when the user reads that documentations they would be well informed. The dev and user are communicating; a win-win for both. Modern software often fails to meet a users needs due to the fast paced, money driven environments. Companies should be applauded when they reach out and ask "is this something you could use?" and "what would make it better?" before a product is released.

This discussion is attempting to define "breaking." @rbeyer is being strict in the interpretation of breaking. A very valid stance. A small hole can grow if not restricted. @scsides is asking for a better way to get user feedback. A commendable goal?

AndrewAnnex commented 3 years ago

@tmwilliams9999 I would actually say that incomplete features are the new normal since the advent of agile methods, although this isn't necessarily a good/bad thing.

An alternative to long lived feature branches would be to further decouple the feature branch from the main repository as a new module, and turn them into a new, small independent project/plugin onto itself. These projects would depend on a version of ISIS and could be modified however frequently as needed.

As an outsider, it does sound like missions can't absorb development complexity in the same way as the ASC can.

jessemapel commented 3 years ago

This seems like two discussions as @tmwilliams9999 mentioned.

Re breaking changes:

I agree with treating the applications as we would functions, but I struggle with defining what a breaking output change is. Output files do not have their own API that defines how you interact them; so, what is and is not breaking either needs to be VERY explicitly defined or it's going to fall to a judgement call. I would say that the hist change is breaking because it removes a column, but I would not say that the stats change is breaking because it changes the contents of a value string. What about re-ordering the columns of a CSV file? What about adding new sections to a text file like the BundleOut.txt file? There's so much grey here.

Re stable and unstable branches:

We already follow a modified form of the gitflow that @michaelaye linked. Instead of using a master branch, we instead use git tags that are the code as it existed when the release was built. When we pull a feature release we pull a new release branch and that isolates it from incoming changes. We also generally develop large features in feature branches. I don't see a good way to avoid feature branches having a "cost" associated with them that grows as they remain un-merged.

The stable/unstable track idea sounds exactly like Long Term Support to me. We have a stable LTS branch that gets bug fixes and we then have an unstable dev branch that gets everything. Isolating breaking changes into yet another "more stable, but still kind of unstable" branch seems like extra work for little reward.

Ideally, I would like to see a very stable LTS branch and then an unstable dev branch that has continuous deployment on it. We are working towards CD, but there's still more work to do to get us there. So, I'd say the model with a smaller number of releases off of dev between LTS versions could work for right now.

rbeyer commented 3 years ago

I agree with treating the applications as we would functions, but I struggle with defining what a breaking output change is. Output files do not have their own API that defines how you interact them; so, what is and is not breaking either needs to be VERY explicitly defined or it's going to fall to a judgement call.

Sure, this is ambiguous, and ultimately always requires a judgement call. I agree that trying to define a strict algorithm for what is and isn't a breaking change in this context that does not require an informed judgement call would be extremely difficult, and any effort to do so would likely be incomplete.

My hope is that we'd establish some guidelines that might help us make better initial decisions, encourage more discussion on these topics as they come, and encourage developers to start an Issue if they have a question about whether something is breaking or not. Will we always get it right? No. But I think we can probably always try and do better.

I would say that the hist change is breaking because it removes a column, but I would not say that the stats change is breaking because it changes the contents of a value string.

I'm gonna disagree there, it didn't change the contents of a value string, it added a new key/value pair to the text output with a value of "N/A" when before the key simply wasn't present. Before the (unwritten) "contract" with the user (based on what the program did before) was that if a key didn't appear, it could be considered "no data." If you were returning a Python dict or other hashed data structure from a function, there is a difference between testing for the existence of a key at all, and testing the value of the key.

I certainly can appreciate your argument that you're getting a hashed data structure back one way or the other, and it is up to the user to figure out how to deal with it properly. Sure, if everyone already had a deep test suite on their own code, then this would be immediately obvious. The ISIS test suite has gotten significantly better over the last few years from dedicated efforts on the part of the USGS, but the userland code out there (missions, individual scientists, etc.) has not. Is it fair? No.

Maybe this really implies that we need a more formal "specification" for the various text files that ISIS writes out, so that it is clear what the user can and can not expect (you're getting keys and values, but it is up to you to test both for key existence and key value, or whatever). Right now, there is no "spec" and so downstream developers and consumers of these data files simply have to make their "best guess" based on the files that they get, and this can lead lead to poor implementations if those guesses are not aligned with the intentions of the authors. Even just a few sentences from the developers about the output files (even on a per program basis) could close a wide ignorance gap between developers and users.

jessemapel commented 3 years ago

This has come back up with a typo in camstats/caminfo. There was a keyword that had a typo in it (AspectRatioMaximun instead of AspectRatioMaximum). Something like this does not seem API breaking to me as it's a simple typo fix. These types of changes should be documented in the application history and changelog but can go out in feature relases.

jessemapel commented 3 years ago

Link to PR https://github.com/USGS-Astrogeology/ISIS3/pull/4597

rbeyer commented 3 years ago

This has come back up with a typo in camstats/caminfo. There was a keyword that had a typo in it (AspectRatioMaximun instead of AspectRatioMaximum). Something like this does not seem API breaking to me as it's a simple typo fix. These types of changes should be documented in the application history and changelog but can go out in feature relases.

I will sit down and attempt to draft some language for a policy on the title of this Issue, but this is not really about what is or is not a breaking change, so much as it is about what policy do we use when we have identified one. I would argue this is a breaking change, but I admit that it isn't a big one. If someone had written a complicated mission processing pipeline based on the version with that typo keyword, and then we changed it, it would "break" their system. And I think it is vitally important to craft our policy with the user in mind, not just what is easiest for us, the developers. However, I'm about to make an argument for how to deal with this without having to major-tick the whole codebase to remedy this problem:

I suggest a two-step: Step1: Add an "AspectRatioMaximum" keyword that does the same as the old mis-spelled one. This addition is just a bug-fix (or a feature--minor tick--if you really want, you can even cleanse the typo keyword from the docs, and otherwise just leave it as a "silent" keyword that would still work but isn't advertised). And put in a deprecation warning for "AspectRatioMaximun" that advertises to those that use it, that it is "deprecated" and scheduled for removal at the next major patch, and that they should start using the new keyword. Step 2: at the next major tick, remove it.

This may seem like "extra work" for something as simple as this, and it does "cost" a little extra in code management time, but it leaves the typo keyword while providing the new keyword, and then gives those that might have used the typo keyword (because they had no choice at the time) a smooth off-ramp for change, with a known, advertised time for a forced change (next major rev).

scsides commented 3 years ago

@AndrewAnnex

ASC absorbing development complexity is the problem. They can't and could easily drown if it isn't kept under control.

Interesting idea about having add on/plugin pieces as new features. This would put the responsibility for not using new/unstable features in production pipelines on the user. The tooling doesn't seem like it would be too complex either. One problem I see is that it comes back to visibility and time. Even publicized release candidates don't see many downloads and even fewer get comments or issues. Additional features in the form of a package plugin are likely not going to get the attention needed to stress test them, and provide meaningful comments. Leading back to the same old problems as with feature branches:

New feature is published outside of normal channels (plugin, release candidate, dev branch, ...)
New feature gets little attention
New feature goes into main public release
New feature gets used and comments and requests come in
Comments/requests are addressed and yet another major release tick happens

This is the reason I am looking for a way to include new features with multiple fixes, and enhancements into public minor releases while marking them "EXPERMENTAL, NOT FOR PRODUCTION." They would be non-breaking of the main library and apps. Perhaps an environment variable that must be set before an unstable feature or library routine can be used (export IUnderstandThisIsGoinToChange=TRUE).