Open zajacn opened 3 weeks ago
Hi Natalia @zajacn,
thanks for the great description of the issue and for testing it so thoroughly!
Hm, so if you already have a test case that reproduces the issue, could you please share that with me? That would make it way easier to track down what is going on here. My gut feeling is that some of the checks for file lengths etc might accidentally be done in the wrong order under your specific conditions. I'd like to debug this with your test data - let's see.
Cheers and so long Lucas
Hi Lucas,
Thanks so much for your reply. I attach the test files I used for testing. test_files.zip
Best wishes Natalia
Hi Lucas,
Many thanks for this awesome tool and fantastic documentation. I have one issue regarding the behaviour of --filter-mask-bed parameter. I will use a test example to show what I mean.
I have a sync file of 10 SNPs on the same chromosome at positions 3364 - 3498 for 16 pools. The sync file has no missing data (all pools have a coverage of a minimum of 25 for each position) and no header. It looks like this (I put the first line):
agouti_scaf_1001 3364 T 0:32:0:0:0:0 0:46:0:0:0:0 0:44:0:0:0:0 0:23:0:0:0:0 0:48:0:0:0:0 0:51:0:0:0:0 0:58:0:8:0:0 0:46:0:8:0:0 0:32:0:0:0:0 0:75:0:0:0:0 0:64:0:6:0:0 0:62:0:0:0:0 0:40:0:0:0:0 0:69:0:0:0:0 0:47:0:0:0:0 0:47:0:0:0:0
The chromosome on which these SNPs are is 10,000bp long. I have a fasta file of that chromosome and my genome.fasta.fai looks like this:
agouti_scaf_1001 10000 17 80 81
I want to calculate Fst (Unbiased Hudson) over intervals of 2kb. But I also have a bed file for masked positions that looks as follows:
The commands I tested were:
grenedalf fst --sync-path test.sync --window-type interval --window-average-policy window-length --method unbiased-hudson --write-pi-tables --pool-sizes 24 --reference-genome-fasta-file genome.fasta --window-interval-width 2000 --window-interval-stride 0 --out-dir . --file-prefix test.2kb --filter-mask-bed mask.txt
grenedalf fst --sync-path test.sync --window-type interval --window-average-policy window-length --method unbiased-hudson --write-pi-tables --pool-sizes 24 --reference-genome-fai-file genome.test.fai --window-interval-width 2000 --window-interval-stride 0 --out-dir . --file-prefix test.2kb --filter-mask-bed mask.txt
Both of the commands produce satisfactory results and run successfully but the message says:
Processed 4 windows. Processed 1 chromosome with 6781 positions.
Now when I modify the bed file used in --filter-mask-bed to contain any position above the 6781, e.g. as such:
I get an error message:
Why are only 6781 positions included if my chromosome is of length 10,000? Is there something I am doing wrong? I have created a test example because I get this error when running it on my real data. I would appreciate if you could help me understand the issue. Let me know if you need more information.
Best regards, Natalia