uqrmaie1 / admixtools

https://uqrmaie1.github.io/admixtools
71 stars 14 forks source link

auto_only: change default value to FALSE #68

Open bergstand opened 3 months ago

bergstand commented 3 months ago

Functions calculating things from genotype files have an option auto_only, which excludes data from chromosomes not named 1 to 22, and is set to TRUE by default. I was caught out by this when working on an organism with more than 22 chromosomes, and I suspect that many others will too, if they don't realise that this is the default setting. From what I can tell, not all relevant functions list auto_only among their arguments, for example the f4 function does not (I'm guessing because it's calling other functions to read the genotypes), and so a user just running the f4 function might have part of their input data ignored without realising it.

I suggest some changes to how the auto_only argument is used, in order of my preference:

  1. Set the default value of auto_only to FALSE. Fundamentally, I think a software on default settings should not make empirical assumptions about the number of chromosomes in the users input data, nor about chromosome names. The reasonable default expectation is that the software uses all the data provided by the user.

  2. Make sure that every function to which auto_only applies lists this option on its help() page.

  3. Print a warning message if excluding any chromosomes when reading a genotype file.

Thanks for a great package!

uqrmaie1 commented 2 months ago

Thanks for the suggestion, I agree that auto_only should default to FALSE! However, changes of the default values tend to create problems for people who don't anticipate them. And default values that don't match those in the original Admixtools programs lead to even more confusion.

I added a warning for now, but I'd encourage anyone else who has strong opinions on the default value of auto_only to leave a comment here. I'm happy to change the default value, if people overlook this despite the warning.