Open cbjeukendrup opened 1 year ago
@cbjeukendrup would this maybe be more appropriate as a discussion rather than an issue? We can definitely raise it in one of our weekly team calls.
I'd not prefer moving it to Discussions, because Discussions are less convenient and nobody ever looks at them. And this issue is somewhat important for choosing our development policy.
@igorkorsukov Now that both engraving developers are about to make changes to the file format, this issue becomes actual again.
To summarise: since different changes to the file format are made independently of each other, a score from a development build between two file format changes will not properly conform to any released version of the file format. When opening such a score in the final release, this may lead to strange bugs, which consumes development time. Therefore, I think we should increase the file format version number after every change that we make to the file format, so that scores from development builds between file format changes also have a file format version number to conform to. That way, we can detect that those scores are from a development version, and decide what to do with them.
Indeed, deciding what to do, is what we need to do now. The "problem" is namely that for every file format version change, we need to create a whole new copy of the reading code. (Although this leads to an ever-growing code duplication and thus also an ever growing compilation time, the advantages are clear: simple code paths, no side effects of changes. So right now I won't propose to change that.) We need to decide on a policy about older versions. In the original post here, I mentioned some options, but perhaps the best option is to preserve the following versions:
@igorkorsukov Do you think this approach is okay?
@cbjeukendrup
I think one should change the file format version as soon as there are new structure changes. I think that the version should become the next release, i.e. no need to increment the version for each change, no need to add intermediate versions. Subsequent changes should complement the current new development version.
For example, suppose we now have a file format version of 400 We added new ornaments, made version 410 We added expressions - added changes to version 410 (will also read ornaments) Added some fixes - added changes to version 410 (will also read ornaments and expressions) Released version 410
Yes, with this approach, it may happen that files created by the nightly version may not open in the next nightly version if some intermediate fix changes something rather than expands. We need to understand how important this really is. Users should not expect that files created by nightly versions can be opened in the future, can we somehow warn about this?
By adding support for reading intermediate formats, the main thing that confuses me is that for the short-term convenience of a limited number of people, we complicate the code that will need to be maintained throughout the life of the application. For example:
So it turns out, for an intermediate implementation of reading ornaments, which lives 4 days and with the it was a couple of simple scores were created - we will support this all our lives in the future. I don't think it's logical.
I think that if we really need to keep reading intermediate versions when they break, then it’s better not to change the file format version, but to add one more field to the file, something like devPatch
.
Then we can call the reader version 410 and check in the right places
// When reading expressions:
if (ctx.devPatch < FFV_NEW_EXPRESSIONS) {
// read old expression
} else {
// read new expression
}
Then it will be more clear what is what.
This was done mainly so that it was possible to greatly change the file structure, and not just slightly expand the current one. For example, remove the Measure as a top-level container.
a lot of work: duplicating the whole reading code every time we change something, even for a simple typo fix, is a lot of tedious work. a lot of duplication: that doesn't need any explanation
Here you need explanations, for example, what kind of renaming are we talking about? If we are talking about code renaming, for example, renaming a method, or a class name, then the IDE can handle it without problems. If we are talking about renaming a field in a file, then independent implementations work well here, the old ones do not need to be touched, nothing will change in the old files, and the new name should only be in the new implementation. Let's discuss more cases in which this approach is not convenient.
The old approach, without isolated implementations, confused me mainly because we have a lot of code and don’t even know what it is for, what it is for, i.e. we can only add and expand and cannot change or remove anything. Well, we cannot change the structure in a significant way, for example, just the relationship between parents and children.
Alternatively, I had this idea: We can change the file format version into components Like this:
Or maybe even better done according to https://semver.org only patches need to be made so that they are backwards compatible (for example, duplicating the old and new implementations when writing)
Then it is logical to make independent readers for the version of the structure, and inside check for the version of the extension and the patch
a lot of work: duplicating the whole reading code every time we change something, even for a simple typo fix, is a lot of tedious work. a lot of duplication: that doesn't need any explanation
Here you need explanations, for example, what kind of renaming are we talking about?
I meant that if we want to change the name of a field in the file, then we need to update the file format version number, and because we need to update the version number, we have to create a whole new copy of the reading code, while the difference between the new version and the old version is only one word.
Another example, as far as I know, read410 is very similar to read400.
In this situation, we might indeed choose to create not a whole new copy, but just update the existing reader to support the version with typo fix too.
However, I'm not sure if that's really convenient in the longer term. What if we later decide that we also want to add bigger changes? Then we still need to create a copy, but we've already modified the original for that typo change, so after copying we need to restore the original to its original state. Not optimal.
And, as I said in the previous comment, when each version is isolated, it's very clear what belongs to what, there are no distracting if
statements, and when making changes to the latest version you can be sure that won't have side effects on the existing versions.
On the other hand, having each version isolated does make the codebase a lot bigger, which increases compile time... and versions will doubtlessly have a lot in common so there is code duplication... it's a difficult dilemma.
One thing that we can do: after we have updated the version number to 420 and created a 420 reader, we are certain that 410 is frozen. That means we can safely start identifying pieces of code that are shared between 410 and 400, and let 410 call 400's methods in those places. That way, we still don't risk making changes with side effects, but we also filter out the worst code duplication cases. And we can do the same for 420 once we've moved to the next version. But not sure if that's a good idea.
I'm not sure if I like the idea of using different version components. That might cause problems because older MuseScore versions won't know about this, and actually I like the simplicity of having just one number that just increases. But it is true that there is benefit in adding a way to distinguish very big file format changes from changes that will still be understood correctly by older versions. (But again, the problem there is that when beginning the development of a new version, we can't always be sure how big the final changes in that version will be... so we can't always decide if it should be a major version number bump or a minor one.)
(regarding SemVer: note that if we make a patch release but maintain backward compatibility, we would not have had to increase the version number at all 🙂)
But back to the original problem of this issue:
So it turns out, for an intermediate implementation of reading ornaments, which lives 4 days and with the it was a couple of simple scores were created - we will support this all our lives in the future. I don't think it's logical.
That's why I suggested in my previous comment to keep only the latest version, and the previous version during the development. So, in the long term, nothing really changes, but only during development there will be always one intermediate version at a time in order to be able to convert scores from the previous to the latest one.
As an alternative to increasing the entire version number, we can indeed add an additional "dev_version" attribute, that will be increased during development. If this attribute is present in a score, it is from a development build; if it's not, then the score is from a released version. For development builds, each nightly build supports reading files from the current and previous development file format version, so that there is a way to "convert" existing scores. This will be handled by if
statements, so we don't create a new copy of the reading code, and we can remove these if statements just before the release when there won't be any other changes.
Perhaps that is indeed the best solution to this issue, since it's so simple.
One more concrete question: what do you think, for the 420 format, should we create new copy of the reading code, or just extend the 410 code? It looks like Michele is doing the second option, in #18722.
I agree that it is not convenient and redundant to create a new copy of the read implementation if we want to add a couple of fields.. But I don’t like it when a lot of ifs appear, when it’s not clear what is what, when it’s easy to break the reading of old formats, and therefore we need to keep the code as it is, even if we don’t know why ...
There is another question here, do we want to make large structural changes to the file within one major version of the application? For example, if we decide to remove a Measure as a top-level container, will we do it in 4th or will we release the 5th version? Although the user will not be affected by this change, it may be necessary for subsequent changes. (@bkunda probably a question for you :) )
If we do not make large structural changes in one major version, then I think it will be fine to keep one copy of the reading implementation per major version. Although in general, nothing prevents we from creating a copy of the implementation of reading at any time if necessary.
So I see options:
@cbjeukendrup What do you like or what other options are there?
As an alternative to increasing the entire version number, we can indeed add an additional "dev_version" attribute, that will be increased during development. If this attribute is present in a score, it is from a development build; if it's not, then the score is from a released version. For development builds, each nightly build supports reading files from the current and previous development file format version, so that there is a way to "convert" existing scores. This will be handled by if statements, so we don't create a new copy of the reading code, and we can remove these if statements just before the release when there won't be any other changes.
Perhaps that is indeed the best solution to this issue, since it's so simple.
I like this approach :)
This issue describes rather a problem for developers and testers than a problem for users. The fact that we are having this problem with our current policy is obvious; the reason that I'm still logging an issue for it, is to discuss if and how we should improve this.
The problem
Our current policy looks something like this:
The score created in step 2 is now in an inconsistent state. Its FFVN is still 400, but the contents of this file will be a mix of (at worst) 3 different versions:
Later, when the user opens this score in the released version of 4.1, things don't work properly. The score is detected as corrupted, or causes crashes, or other problems occur... We spend a lot of time trying to find the causes, and eventually we finally realise that this score is from an unreleased version (because luckily the commit of the MuseScore version is also written to the file). We conclude that we can't do anything and don't need to do anything, but we have wasted a lot of time figuring that out.
How common is this and how problematic is it for users?
Not very, of course, since users are advised to use nightly builds with care and not for real work. But for 4.0, this has happened several times, which caused us (and potentially others) to waste quite some time on useless debugging.
Proposed solution
I propose that we simply increase the FFVN in every pull request that changes any aspect of the file format.
This way, we rule out the possibility of a file being somewhere in-between between two versions.
Problems to overcome
Given our current ideas about the reading code (i.e. creating a full isolated version of the reading code for every existing FFVN), the proposed solution would give two problems (or, better said, it magnifies these problems rather than introducing them):
Possible solutions to these problems
Additional thought
We could create named constants for file version numbers. For example:
(This example does not necessarily correspond to the reality, but you get the idea)