poseidon-framework / poseidon-schema

An archaeogenetic genotype data organisation file format
0 stars 1 forks source link

Identifiers #43

Closed xrotwang closed 2 years ago

xrotwang commented 3 years ago

The definition of Individual_ID seems inconsistent:

The Individual_ID column has to represent each sample with a world-wide unique identifier string equal to the identifier used in the respective accompanying publication.

If Individual_ID has to equal the identifier used in the publication, how can one make sure it's also a UUID? The specification goes on to say

There is no central authority to issue these identifiers, so it remains in the hand of the authors to avoid duplication. The Individual_IDs are also employed in the genetic data files and therefore have to adhere to certain constraints.

which might create even more problems to reconcile "identifier as in publication" with what Poseidon wants. From my experience, I'd recommend to distinguish local IDs and global UUIDs explicitly, and specify all of the constraints for the UUID.

xrotwang commented 3 years ago

Btw. the definition in the schema does not have the "world-wide unique" clause.

stschiff commented 3 years ago

OK, let me play dumb and naïve here. Why would it be so terrible if we simply enforce uniqueness within our tools once a set of packages gets read in? Why do we have to somehow enforce "world-wide-uniqueness" at the format specification level? So, in other words, why does "uniqueness of identifier" have to be something that we need to somehow guarantee globally, rather than locally?

xrotwang commented 3 years ago

I agree that "world-wide-unique" may be a bad trade-off. Often somewhat meaningful local IDs are a lot easier to maintain, and combined with some sort of prefix (possibly derived from provenance) when aggregating data, things work well. I'd still say that having identifiers for things that are referred to from other objects are an important piece of data - and should not only be minted upon reading data - and samples seem to be such a thing.

stschiff commented 3 years ago

Fair enough. Well, Clemens and I have pondered some kind of "Poseidon" sample ID, which we declare once a new package is taken into our central repository. That may make sense at some point.

stschiff commented 3 years ago

This was discussed again on Oct 14.

Preliminary decision: