populationgenomics / seqr

web-based analysis tool for rare disease genomics
GNU Affero General Public License v3.0
3 stars 1 forks source link

Downloaded results use internal sample identifiers, not pedigree identifiers #154

Open mathob opened 2 years ago

mathob commented 2 years ago

Describe the bug When a user downloads the results of a search in a spreadsheet format the sample identifers used are apparently internal CPG identifiers but should be same identifiers used in the pedigree

Link to page(s) where bug is occurring After any variant search in CIRCA project click "Download", save in spreadsheet format

E.g. https://seqr.populationgenomics.org.au/api/search/9923b6381576183e21edaf034cca20ac/download?file_format=xls

In family C21SO329 the results spreadsheet has sample identifiers like CPG243881 which are unknown to the user

Scope of the bug Occurs in CIRCA project, but not in NA12878 trio project

Screenshots

image

illusional commented 2 years ago

IMO this is an issue with seqr not exporting the participant ID within this report. Seqr only allows one external sample identifier, and we have to use the internal CPG ID as they must match the joint-call, and we can't guarantee external identifiers are unique across projects.

cassimons commented 2 years ago

But there is still no reason for seqr not to use the Display ID in this report is there? Should it not just use the same logic as is used in the genotype UI and use the expected display ID?

illusional commented 2 years ago

I think they're being extra granular with what sample because that's what's actually in the ES index. The DisplayID is the individual's ID, so I think if any change should be made, we should include that in addition.

cassimons commented 2 years ago

Yes... but the internal ID is not presented via the UI at all. The user is only ever expected to see the display ID, I am not sure why that should be different just because they are getting a csv version of the same results presented in the UI?