ksamuk / pixy

Software for painlessly estimating average nucleotide diversity within and between populations
https://pixy.readthedocs.io/
MIT License
117 stars 14 forks source link

add option to always have intervals spanning entire chromosomes #19

Closed mscharmann closed 3 years ago

mscharmann commented 3 years ago

Hello, I am trying to have just a single pixy run over all chromosomes, and I would like the intervals (windows) to be automatically starting from position 0 (or is it 1-based?) to the end of each chromosome, independent of the VCF. The currently available options do ot seem to allow this, am I correct? I could only force this behaviour for one chromosome at a time, using a combination of these three arguments:

--chromosomes [CHROMOSOMES] A single-quoted, comma separated list of chromosome(s) (e.g. 'X,1,2') --interval_start [INTERVAL_START] The start of the interval over which to calculate pi/dxy. Only valid when calculating over a single chromosome. --interval_end [INTERVAL_END] The end of the interval over which to calculate pi/dxy. Only valid when calculating over a single chromosome.

Alternatively, one might introduce a Pixy option to import a .BED file with regions (intervals, windows) to calculate the stats over; this could then be used very flexibly for any desired intervals.

with best regards, Mathias

ksamuk commented 3 years ago

Hi Mathias!

You're in luck, all these features already exist in the forthcoming version of pixy! I'll reference this issue when we do the release in a week or so.

Cheers,

Kieran

mscharmann commented 3 years ago

Hi Kieran, thanks for the quick reply, I can't wait to get the new version ...! Cheers, Mathias

EveTC commented 3 years ago

Hi I am also looking at a way to calculate it over the whole chromosome. How can I install the new version? I do not think it is the one available through conda? Thanks

ksamuk commented 3 years ago

Hi there, the newest version of pixy on conda-forge (1.1.1.beta1) has this feature. Omitting the --interval_start and --interval_end arguments should result in calculation over whole chromosome(s). You can optionally also specify intervals manually using the --bed_file option. Let me know if you run into any issues!