Open jtniehof opened 3 years ago
I seem to recall the only time interface versions change are:
Otherwise a bump in interface (major) version of an input file results in just a bump in quality (minor) version of the output file.
I went digging and runMe
seems to be the critical point where this is done.
If a file's been processed before and there's now a new version of the code, _codeVerChange
will increment the file's version based on the code bump. So if the code interface (major) version has bumped, a new version of an existing output file will have an increment in interface (major) version. This increment is _relative to the existing output_interface_version
of the code_, so there may be a case that results in TWO bumps. If code 2.x.y made file 1.a.b and then code is updated to 3.0.0 (and its output interface version bumped to 2), the new file would be 3.0.0. That doesn't seem right. (However the _codeVerChange
is only called in the event a filename already exists in the database, and 2.0.0 probably wouldn't exist in that case?)
Here's what it looks like happens; this is all in the runMe
constructor (start line 356):
output_version
) is initialized to code_output_interface_version.0.0
where code_output_interface_version
is the output_interface_version
from the code
table for the code that's going to be run. I'm going to call this OIV
.force
argument to the runMe
constructor is True
, then return out, stick with existing version number (OIV.0.0
). This is different from the --force
argument that gets used to explicitly bump the version (that goes into the version_bump
data member.) This force
argument is pretty much just used by DBRunner
.output_version
is not in the database.
b. If version_bump
is set, bump that version and continue the loop (return to a
).
c. Call _codeVerChange
. If the code has changed since the file was last built, increment the same portion of output_version
and zero any "smaller" ones: if code had an interface bump and output_version
is x.y.z
, then output_version
will become x+1.0.0
. If any increment, next iteration of loop.
d. Call _parentsChanged
. If there are new parents (e.g. optionals that have shown up), increment the quality version. Else increment the quality version if any parent had an increase in quality version; else increment the revision version if any parent had an increase in revision version. No increment for parental increment in interface version. Next iteration of loop (back to a
) if any increments.
e. Continue to next iteration of loop. Note this will be an infinite loop if reaching this point.I believe these are pathologies and I will try to runMe tests to capture them (and we can define desired behavior):
OIV.0.0
and OIV.2.0
exist in the database (which is weird but not inherently an inconsistency), and version_bump
is set to 1, rule 3a
above means the new file will be version OIV.1.0
, not OIV.3.0
as I'd expect.1.0.0
make output 1.0.0
from input 1.0.0
, and then input is updated to 1.1.0
and code to 1.0.1
, the output will be 1.0.1
(rule 3c
supersedes 3d
) but should probably be 1.1.0
.OIV
, 3c
means that there can be weirdness where files don't match the OIV
. So for instance code of version 1.0.0
has OIV 1 and makes an output file version 1.0.0. Then the code is updated to 2.0.0
because of changes to its inputs but OIV remains 1. Rule 1
combined with 3c
means that the updated output file would be 2.0.0
not 1.1.0
. This also violates the principle that @balarsen and I agree on that interface version increments should be intentional.1.0.0
has OIV 1
and has created a child product version 1.0.0
from parent product 1.0.0
, and then parent product 2.0.0
comes along, rules 1
and 3d
mean that the new child is 1.0.0
. Infinite loop.I think pathology 1
is minor. I'd like to fix it, at a lower priority. I think the way to do that is to get the latest version of the file using getFilesByProductDate
and utc_file_date
(or frankly a new function to get a single file by a single date). Then start with output_version = OIV.0.0
. If output_version
is greater than the current existing version, great! Otherwise set output_version
to the current version and do a single increment according to all the rules in 3
. The filename would be made after the loop, which would now be an if statement. I think this would be faster...fewer database lookups per pass, one pass instead of multiple. See also fix for 2
.
2
is medium-small priority; triggering it requires code changes and data changes at same time. I think this is fixable by changing _parentsChanged
and _codeChanged
to just report out what changed and then bump whichever is a bigger change between the two.
3
is a bigger deal. I think all that needs to happen there is to update _codeVerChange
so that an interface bump in the code results in a quality bump in the output_version
.
4
is also a pretty big deal; again what needs to change is that if a parent has an increment in interface version, increment the quality version of the output_version
.
I actually have deployments waiting on 3
and 4
, so can probably put something together for them relatively quickly.
@balarsen , I could have sworn we had a big writeup of how the versioning scheme worked somewhere, but I'm not finding it in your SOC_Processing_Chain_Setup.docx or the ECT SOC manual that I wrote. I know it didn't go in the original ICD.
You certainly did, I will see if I can dig it up. Likely only on my office desktop.
This is also a good time for a discussion what we actually want the versions to do and mean.
The initial intent was: versions are X.Y.Z
With some 20/20 hindsight.
On PSP we hide the internal version and the public only sees the release number. I was planning on that for ECT as well but for some reason that didn't happen.
For reference of "what we want the versions to do and mean", this is what I have for the ECT release versioning, which isn't exactly how it went. Note this is mostly about the external versioning.
Every file has version x.y.z. With this the SOC can look up versions of everything that went into the file (and, reproduce it from level zero files and codes.) This version is not user friendly: two files made with the same codes, but for different days may have different versions due to processing history. Goals
For PSP, because the data aren't coming down daily (like on ECT) but in chunks of months at a time, we have scheduled releases which get all the newly downlinked data and any updates. To date we've only had one out-of-band release for just updates. The release notes give the basic idea.
The release script has a set of transforms based on SpacePy VarBundle operations that actually creates a new file including a subset of the variables and, in some cases, transformed (e.g. leaving out bad channels.) The dbp database structure is the same, just having a single table with columns for file ID and release number.
In practice I've found having an internal and external version is just about essential; it lets you know what's going on internally, particularly with early analysis before public release, while keeping things simple externally.
i mostly want to capture a conversation I had with @balarsen . The interface version is treated specially because it isn't usually auto-incremented. The only thing we have as a hard requirement is that a particular code will always create output of a defined interface version. We were sort of thinking in terms of a code's interface version usually changes if either its input interface or output interface changes, but we don't have any hard mapping of code interface version to the interface version of its input files.
I'm in a situation where Code A makes product X which code B uses to make product Y. Product X's interface version is going to change, as it's getting more variables (and people shouldn't be looking for those new variables from 2.m.n in old files of version 1.o.p). But code B doesn't touch the new variables, so interface 1 vs. interface 2 of product X doesn't matter to it. Thus I'm going to bump interface version of A, output version of A, interface version of X, and nothing else...but there's no real indication that code B interface 1 can handle input product X of interface version 1 or 2. The chain doesn't really treat them differently.
@balarsen , I could have sworn we had a big writeup of how the versioning scheme worked somewhere, but I'm not finding it in your SOC_Processing_Chain_Setup.docx or the ECT SOC manual that I wrote. I know it didn't go in the original ICD.
Proposed enhancement
Document any requirements or recommendations for how interface versions between products and codes relate to each other.
Alternatives
Continue to hold tihs in my head....
OS, Python version, and dependency version information:
Version of dbprocessing
Current master from github (734f37b1bfb3540f5682edd6dbb2e590eb51a3ff)
Closure condition
This issue should be closed when appropriate documentation is merged.