Closed igronau closed 1 year ago
Okay. I think I found the source of the problem. In rule dadi_generate_cache
(dfe.snake
), the sample size is set as follows:
sample_size = 10,
However, the sample size of the frequency spectrum (fs) file generated from the data is 20 (10 diploids). It's probably not a good idea to have the sample size hard coded in the snakefile. A more robust implementation would match the sample size in the generated cache to that of the fs file with which it is used. This requires having the fs file as one of the inputs to the dadi_generate_cache
rule and extracting the sample size from this file.
I don't want to apply this change and break something else. This seems to be also an issue with the non-tiny config file. I'll deal with this in next coding session.
I was able to fix this by making the code change below in dfe.snake
. I'll submit a PR by the next coding session.
Original code:
sample_size = 10, grid_size = '800 1000 1200', gamma_pts = 2000 threads: 8 shell: """ dadi-cli GenerateCache --model {params.demog} --demo-popt {input} --sample-size {params.sample_size} --output {output} --grids {params.grid_size} --gamma-pts {params.gamma_pts} --cpus {threads} """
New code:
fs_file = output_dir + "/inference/{demog}/dadi/{dfes}/{annots}/{seeds}/{chrms}/pop{ids}/pop{ids}.dadi.neu.fs", grid_size = '800 1000 1200', gamma_pts = 2000
threads: 8 shell: """ sfs_size=head -n1 {params.fs_file} | cut -f1 -d " "
num_samples=$((sfs_size-1)) dadi-cli GenerateCache --model {params.demog} --demo-popt {input} --sample-size $num_samples --output {output} --grids {params.grid_size} --gamma-pts {params.gamma_pts} --cpus {threads} """
ah we got this bug smoothed out in #85
Yes. New version of code works without error. Thanks, @andrewkern
Error message: ValueError: Cache and frequencey spectrum do not have the same sample sizes