Closed santiago1234 closed 7 months ago
Hi @santiago1234 , thanks a lot for reporting this!
(1) I will finish writing the documentation for parallel_singer
and sorry for not completing that;
(2) btw, I noticed that you used thinning of 1, which might to be short for effective mixing. I would suggest using -thin 20
instead;
(3) Could you do me a favor and check something like results/trees/mex_chr22_j_j+1_nodes_*.txt
? Does it extend from 0-999 or also abort midway?
(4) In terms of African data, I used naive parallelization (I didn't have parallel_singer
by then). I will give you one example of my job submissions:
#!/bin/bash
#SBATCH --account=co_songlab
#SBATCH --partition=savio2_htc
#SBATCH --time=16:00:00
#SBATCH --error=slurm_job_%j.err
#SBATCH --output=slurm_job_%j.out
#SBATCH --cpus-per-task=1
singer_master -Ne 2e4 -m 1.2e-8 -vcf ../african_chr1_cleaned -output ../singer_args/african_chr1_22_23 -start 22000000 -end 23000000 -n 2000 -thin 5
and I submitted a lot of jobs like so.
Hi @santiago1234 sorry I misunderstood your response, so you already checked all other files and only results/trees/mex_chr22_0_1_nodes_*.txt
failed to extend from 0-999 and all other similar files did extend to 999?
Hi @YunDeng98, I am counting the files and this is what i get:
ls mex_chr22_*_nodes_*.txt | awk -F '_' '{print $3 "_" $4}' | sort | uniq -c
144 0_1
1001 10_11
1001 11_12
1001 12_13
1001 13_14
1001 14_15
1001 15_16
1001 16_17
1001 17_18
1001 18_19
1001 19_20
258 1_2
1001 20_21
1001 21_22
1001 22_23
1001 23_24
1001 24_25
1001 25_26
1001 26_27
1001 27_28
1001 28_29
1001 29_30
495 2_3
1001 30_31
1001 31_32
1001 32_33
1001 33_34
1001 34_35
1001 35_36
1001 36_37
1001 37_38
1001 3_4
1001 4_5
1001 5_6
1001 6_7
1001 7_8
1001 8_9
1001 9_10
So just a few prefixes: 2_3, 1_2, and 0_1.
Hi @santiago1234 I might know the reason why. Could you check how many variant sites are there in the first 3 Mb?
The first 3Mb, which covers the region chr22:11211657-14211657, contains only 18 variants. I guess the reason that there are not too many variants is that I have used a mask to exclude low-quality regions.
Yes that is my guess exactly. SINGER doesn't like it when there are only a handful of variants and might abort, and it is not too meaningful to infer ARG anyways. I would recommend just re-run SINGER starting from 14Mb, or if you prefer to stick with your current results, I can guide you how to convert your partial current results into tskit.
Oh, I see. I don't have a problem starting from 14Mb. Would you recommend excluding regions of the genome (for instance, 1Mb windows) where there are fewer than X number of variants? This way, for another chromosome, I wouldn't have to specify this start value beforehand.
thanks again
Right, just remove these 18 variants from the vcf and I think things will work from there. I will write a new parallel_singer
which enforces X number of variants for a window to even run SINGER on it. Thanks a lot for your feedback!
great, thanks!
Hi @YunDeng98 ,
I'm experiencing an issue with SINGER version
singer-0.1.7-beta-linux-x86_64
while processing some human genomes from chromosome 22, specifically using the parallel execution feature. Below is the command I use to run SINGER:The process appears to proceed without issues until the execution reaches the last step, which involves the script
convert_long_ARG.py.
Upon reaching this step, the following error is encountered:
After inspecting the files with prefix:
results/trees/mex_chr22_0_1_nodes
The file numbering ranges from 0 to 142, which suggests the program expects files numbered up to 999. However, upon checking files with other prefixes (e.g.,
results/trees/mex_chr22_30_31_nodes_*
), I found that all the expected files are present.Do you have insights or suggestions on what might be causing this issue?
Additionally, the README file appears to be incomplete at this section. Could you provide an example of how SINGER was run for the African genomes discussed in the paper?
thanks,