Closed nevrome closed 1 year ago
I just realized that this is a pretty important question, because it boils down to which variable can act as a primary key to identify entries in a .ssf file. We don't necessarily need one, in fact, but it's generally better. Not least for error messages.
Yes, this seems to have been an obvious mistake. Samples should not be unique, of course. There is a clear hierarchy, for example spelled out here in plain english
Briefly, you add Samples to a Study, Experiments to Samples and Runs to Experiments.
So, indeed, it is the run accession that should be unique, as it seems to be the last step in the hierarchy.
Good - let's make this part of the schema and then our software tools asap.
Could you add a commit to #60, @stschiff? Then I will try to handle the rest in our software tools.
Let's not merge #60 yet, in case more issues are unveiled thanks to Dhana's PR.
I think this has been resolved now in #60, can be closed when #60 is merged then.
I encountered an example (https://github.com/poseidon-framework/published_data/blob/6025ab3fcccfb8f209869d6ff89eb90b168447ce/2018_VeeramahPNAS/ENAtable.ssf) where this condition does not hold, asked @TCLamnidis for input and thus learned that the condition should probably not be there in the first place.
What do you think, @stschiff? Could you explain why you specified that all
sample_accession
s in a .ssf file have to be unique?@TCLamnidis suggested instead, that the
run_accession
field may have this constraint.If we decided to change this, then we could add it to #60 for the next Poseidon release.