bed file as input - Githubissues

stsmall commented 3 years ago

Hi @ksamuk, I noticed that there is a bed option in the upcoming release, but it doesnt look like it is in the main code as of yet. Is there a way I can implement that in v0.95.0? Maybe pass the intervals by reading from a bed file? thanks, @stsmall

ksamuk commented 3 years ago

Hi @stsmall! pixy 1.0.0 (out very soon) does have bed support, but in the mean time, looping through a file like this would work for 0.95.X:

if you have a headerless bed file formatted like:

X 1 1000
X 1001 2000

You could loop over it like:

while read interval; do

# split the interval into chromosome, interval start and end
chrom=$(echo $interval | cut -d' ' -f1)
start=$(echo $interval | cut -d' ' -f2)
end=$(echo $interval | cut -d' ' -f3)

# force the window size to the be the size of the whole interval
let windowsize=$end-$start+1

pixy --stats pi \
--vcf [your vcf] \
--populations [your pop file] \
--zarr_path data/zarr \
--reuse_zarr yes \
--window_size $windowsize \
--chromosome $chrom \
--interval_start $start \
--interval_end $end \
--output_prefix output/pixy_bed_${chrom}_${start}_${end}

done < regions.bed

You need to test this a bit, but in theory it will generate 1 file per interval, which would then need to be concatenated. In theory one zarr should be generated on the first run, and then reused later. Hopefully that gets you started on your goal. Full BED support will arrive soon!

stsmall commented 3 years ago

Thanks! @ksamuk I ended up doing something similar to what you suggested but using our cluster to parallel the jobs. It seems that it needs to load the zarr for every independent interval. Looking forward to 1.0 release! thanks, @stsmall

ksamuk / pixy

bed file as input #22