Closed nevrome closed 9 months ago
I like it. I made one small change to clarify that "genotype data" means changes in the bed or geno file (because we use the term "genotype data" also sometimes for the entire triple of files.
Hm - I think I would explicitly include the .fam/.ind and .bim/.snp files here.
If somebody changes the name of a sample or a SNP then this is imho also a major change. One could discuss about sex and group names for the samples, but even there I would argue that such changes can easily break genotype data-based workflows. And that is what I think a major package version should preserve.
OK, but then we should say something like "genotype data (either of the three files)" or so.
I changed it now to
When genotype data (i.e. the contents of the
.bed
/.bim
/.fam
or.geno
/.snp
/.ind
files) for any number of samples is changed.
Then the extra point in Minor bumps about adding an individual should be removed. If the genotype dataset is changed in any way, incl. adding an ID, its a new Major v
True! And it should be, I guess.
Imagine you add a new individual for a group that is already represented in the dataset. Then the result of an analysis based on this group will change.
If this is the criterion by which we draw the line for a major change, then adding a new sample must also be a major change.
Thanks for this good review, @TCLamnidis. Very valuable!
OK - I think this is fine now. Will merge.
This PR includes a suggestion for an addition to the schema text to give some guidelines for package versioning. You may have different opinions about this, so feel free to push back. If/when this is merged, then it will close #69.
I think we could add this to 2.7.1 and treat it as a clarification that does not require a new schema release. Or we combine it with some more changes (maybe #67, #66 and #44) and actually prepare a new release.