Open hknahal opened 4 years ago
as far as I remember we agreed submitter Ids are case sensitive
and for completeness the reason was that we don't know, or we know that some systems used by submitters are case sensitive so that's the smallest common denominator
In the past I suggested a config per program for case_sensitive_ids with the default being false.
We could (at our discretion based on the submitting system) allow some submitters, but prevent issues on the other submitting programs having data duplication resulting from format issues.
Describe the bug
If a data submitter submits the same donor or specimen or sample ID but with different casing (ie.
pat_01
vsPat_01
), they are assigned different ARGO IDs (ie. two different DO IDs). Shouldn't casing be ignored? How will the end user (ie. someone searching the Portal later) know what casing to use to search for a donor?Steps To Reproduce
Pat_01
andpat_01
are treated as two different donors with separate DO IDs (DO259139 and DO259138 respectively). Likewise, although the specimen IDs for donorpat_01
are the same, they only differ in that the first letter is capitalized (ie.Pat_01_sp_01
vspat_01_sp_01
), so they appear as two separate tumour specimens.Expected behaviour
Upper/lower casing should be ignored. For example, the following specimen IDs should be considered the same:
Pat_01_sp_01
pat_01_sp_01
If a data submitter is using Excel to put together the sample_registration.tsv file, Excel will sometimes automatically capitalize the first letter of an ID. Perhaps the data submitter meant to submit
pat_01
but it got changed toPat_01
and now they are registered as two different donors.