lifebit-ai / simulate

Simulate genomic data on demand
2 stars 0 forks source link

Modify --effective_population_size #18

Open mmeier93 opened 3 years ago

mmeier93 commented 3 years ago

Would it make sense to make this a proportion the sample size, --num_participants?

If not, ignore.

_Originally posted by @cgpu in https://github.com/lifebit-ai/simulate/pull/13#discussion_r518553427_

mmeier93 commented 3 years ago

The definition of this option, which is a hapgen2 option, is as follows:

Sets effective population size that scales the fine-scale recombination map for the given population. For example, -Ne 11000 sets the effective population size to 11000. For autosomal chromosomes, we highly recommend the values 11418 for CEPH, 17469 for Yoruban and 14269 for Chinese Japanese populations.

See https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html

In addition, when doing hapgen2 --help, one can also see the default value:

Ne <n>                                 : the effective population side, defaults to 11418

Given those definitions, I think it might be best to leave things as they are?

NB: CEPH = Utah Residents (CEPH) with Northern and Western European Ancestry = CEU See: https://www.internationalgenome.org/category/population/