statgen / Minimac4

GNU General Public License v3.0
56 stars 20 forks source link

Option to Subset Samples in Reference Panel #28

Closed jonathonl closed 2 years ago

jonathonl commented 4 years ago

It would be nice to subset the reference panel on the fly by simply adding a parameter to minimac4 (e.g. —refSamples - file with all the sample IDs to use).

sarahgra commented 2 years ago

Has this option been implemented yet? It seems like it would also be very helpful if the topmed imputation server could automatically identify participants overlapping between the input dataset and the reference panel to automatically exclude them?

jonathonl commented 2 years ago

As of v4.1, you can specify a keep list with the --sample-ids-file option. There are no plans to automatically exclude though. I'm assuming you mean exclude the samples from the reference panel. It would be computationally inefficient to recompress the reference haplotype structure for every match encountered.