Open llanos-garrido opened 6 years ago
Runnig G-PhoCS with one locus is not recommended. The problem you're getting is likely a memory issue caused by too much data for a single locus. The expectation is that each locus will have up to a few hundred distinct site patterns. If you dump everything into one locus, you're violating this assumption and the results you'll get will be senseless (even if you are able to run the program).
Hello, Here is my version, what can you suggest?
Reading sequence data... 237290 loci, as specified in sequence file 6ORCAS.
Reading loci (.=100 loci)
...
...after running several hours
...
...
./RunGphocs: line 1: 27776 Segmentation fault /ddn/data/cjqr89/GPhoCS/G-PhoCS/bin/G-PhoCS KillerWhales.ctl
I have 6 populations each with a single sample. Each locus has 10.000 bases and there are no indels for the samples.
Thank you! :)
Hello again, do you have any suggestions for my case from my last post? was my sequence length for the loci too long? or did I have too many loci? thanks
Sorry, I missed your previous post. I'm not sure which of the two factors is contributing to the segmentation fault here. It could be a combination of both. Regardless of that you should shorten your loci and thin them out. 10,000 bp per locus seems too long, because you're going to have many unmodeled recombination events per locus (the model assume no recombination within each locus. So unless recombination rates are very low in killer whales, you should bring it down to 1,000 bp or less. You should also make sure that your loci are spread far enough apart, because the model assumes free recombination between loci. Note that a typical G-PhoCS analysis covers only a few percent of the genome (our human analysis covered ~40 Mb of sequence). You can even start with much less to get some quick results. 5000-10,000 loci should be good enough for starters.
Hello again. I first reduced my loci length down to 1,000 bp and selected top 50,000 loci; that run gave a segmentation error. Next, I reduced my loci down to 1,000 as well to make sure nothing else is causing the problem, but that too gave a segmentation error. Both runs started running properly like the initial run. Can you think of any way that my settings in the control file can make the model too complicated that it may require too much memory after some step no matter how short or few loci are, such as the tau-initial values, or the amount of migration bands between populations? I want to share my full control file below, please let me know if you see anything I should change. Many thanks for your guidance!
GENERAL-INFO-START
seq-file first1Kloci1Kb
trace-file first1Kloci1Kb.log
locus-mut-rate CONST
mcmc-iterations 5000
iterations-per-log 50
logs-per-line 10
find-finetunes FALSE
finetune-coal-time 0.01
finetune-mig-time 0.3
finetune-theta 0.04
finetune-mig-rate 0.02
finetune-tau 0.0000008
finetune-mixing 0.003
# finetune-locus-rate 0.3
tau-theta-print 10000.0
tau-theta-alpha 1.0 # for STD/mean ratio of 100%
tau-theta-beta 10000.0 # for mean of 1e-4
mig-rate-print 0.001
mig-rate-alpha 0.002
mig-rate-beta 0.00001
GENERAL-INFO-END
CURRENT-POPS-START
POP-START
name Norway
samples Norway d
POP-END
POP-START
name NPresident
samples NPresident d
POP-END
POP-START
name NPtransient
samples NPtransient d
POP-END
POP-START
name SouthAfrican
samples SouthAfrican d
POP-END
POP-START
name Antarctic
samples Antarctic d
POP-END
POP-START
name MarionIsland
samples MarionIsland d
POP-END
CURRENT-POPS-END
ANCESTRAL-POPS-START
POP-START
name KW1
children Norway NPresident
tau-initial 0.00001
POP-END
POP-START
name KW2
children KW1 NPtransient
tau-initial 0.00002
POP-END
POP-START
name KW3
children KW2 SouthAfrican
tau-initial 0.00003
POP-END
POP-START
name KW4
children KW3 Antarctic
tau-initial 0.00004
POP-END
POP-START
name root
children KW4 MarionIsland
tau-initial 0.00005
POP-END
ANCESTRAL-POPS-END
MIG-BANDS-START
BAND-START
source Norway
target NPresident
BAND-END
BAND-START
source Norway
target NPtransient
BAND-END
BAND-START
source Norway
target SouthAfrican
BAND-END
BAND-START
source Norway
target Antarctic
BAND-END
BAND-START
source Norway
target MarionIsland
BAND-END
BAND-START
source NPresident
target Norway
BAND-END
BAND-START
source NPresident
target NPtransient
BAND-END
BAND-START
source NPresident
target SouthAfrican
BAND-END
BAND-START
source NPresident
target Antarctic
BAND-END
BAND-START
source NPresident
target MarionIsland
BAND-END
BAND-START
source NPtransient
target Norway
BAND-END
BAND-START
source NPtransient
target NPresident
BAND-END
BAND-START
source NPtransient
target SouthAfrican
BAND-END
BAND-START
source NPtransient
target Antarctic
BAND-END
BAND-START
source NPtransient
target MarionIsland
BAND-END
BAND-START
source SouthAfrican
target Norway
BAND-END
BAND-START
source SouthAfrican
target NPresident
BAND-END
BAND-START
source SouthAfrican
target NPtransient
BAND-END
BAND-START
source SouthAfrican
target Antarctic
BAND-END
BAND-START
source SouthAfrican
target MarionIsland
BAND-END
BAND-START
source Antarctic
target Norway
BAND-END
BAND-START
source Antarctic
target NPresident
BAND-END
BAND-START
source Antarctic
target NPtransient
BAND-END
BAND-START
source Antarctic
target SouthAfrican
BAND-END
BAND-START
source Antarctic
target MarionIsland
BAND-END
BAND-START
source MarionIsland
target Norway
BAND-END
BAND-START
source MarionIsland
target NPresident
BAND-END
BAND-START
source MarionIsland
target NPtransient
BAND-END
BAND-START
source MarionIsland
target SouthAfrican
BAND-END
BAND-START
source MarionIsland
target Antarctic
BAND-END
MIG-BANDS-END
The data settings look fine, so I don't think that they're the cause for the segmentation fault. The only thing I can think of is the number of migration bands. It could be that you're relaxing your model too much and then sampling converges to a "corner" in parameter space, in which divergence times do not really restrict the sampling. First, I would suggest to run a version with no migration bands. This is often very useful because it gives you an idea how much gene flow actually affects your other demographic estimates. When you add migration bands, maybe do it in groups to "weed out" bands that get near zero rates. Another thing to look out for is migration between sister populations (Norway and NPresident in your case). With gene flow between sister populations it is often very difficult to differentiate between a model with no gene flow and a model with gene flow and deeper divergence. So try versions with and without these bands and see if they're causing the problem.
Thank you so much once again!
So I removed all of the migration bands, basically deleted the lines from my control file so it became this there:
MIG-BANDS-START
MIG-BANDS-END
And that too gave segmentation fault unfortunately on my 1KB 1K loci file again after running for some time.
Do you think the way my population structure (having only 1 sample for each population maybe and also the tree) or the tau-initial values may also have an effect? Or anything else you can think of?
Thanks!
I can't see anything in your control file that could be causing this. The initial tau values look completely fine. My guess is that it's something to do with the format of your sequence file. Probably something mundane. If you wish to send me your data file and control file to ilan.gronau@idc.ac.il, i can try to have someone have a look at it. However, this will likely take a few weeks for us to get to this.
Thanks, I have emailed you the files.
You may remember another error I had faced from this thread: https://github.com/gphocs-dev/G-PhoCS/issues/62#issuecomment-520159775 Therefore to make sure it's not because of my manual update on the control file, I also wanted to try converting my original control file I created on windows with your java program to unix format using dos2unix but the converted file also gave the same errors I mentioned in the comment above. I had fixed that myself later making it similar to the example control file.
As for the sequence file, I generated that myself with my own code as you may also remember from another thread https://github.com/gphocs-dev/G-PhoCS/issues/51#issuecomment-502763618 and it is in the same shape as far as I can compare with the example file; and strangely they run well only until some random different point. Thanks
Hello,
I am getting a similar error to this, after ~21 hours the program fails and the last line of my log file reads:
/cm/local/apps/slurm/var/spool/job478857/slurm_script: line 15: 232301 Segmentation fault bin/G-PhoCS Gmac_Gmel_Oorc_control.ctl -n 36
I have 5 samples and 18,754 1KB loci. At first I thought it might be a memory issue as you suggested earlier in this thread, but our HPC specialist confirmed that I didn't run out of memory. The program is successfully running through the penultimate MCMC iteration, and then fails on the final iteration and the trace file isn't readable by Tracer.
Have you had any success determining what may be causing this error in others' files?
Thank you!
Hello Amy, We figured out that for my specific case the problem was not with my files or settings but with my HPC's OpenMPI. I would ask you to try the same run using the Gphocs on single thread (maybe build it in default mode first again without multithread support). That's all I know about it : ) Good luck!
Hi Fatih,
Thanks for the quick reply :)
My first guess was that there were issues with the HPC's OpenMPI, but I talked with the HPC specialist about that and he wasn't able to find any problems there.
I'm hoping Ilan might be able to share further details that might help us pinpoint the issue.
Thanks again, Amy
Amy,
I'm on vacation now for two weeks. Will have a look when I return.
--Ilan
On Fri, Oct 11, 2019, 6:02 PM Amy Van Cise notifications@github.com wrote:
Hi Fatih,
Thanks for the quick reply :)
My first guess was that there were issues with the HPC's OpenMPI, but I talked with the HPC specialist about that and he wasn't able to find any problems there.
I'm hoping Ilan might be able to share further details that might help us pinpoint the issue.
Thanks again, Amy
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gphocs-dev/G-PhoCS/issues/43?email_source=notifications&email_token=ADO7ILVM3DWA7O4XK2BSHQDQOCIQLA5CNFSM4FBRM572YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBAI2MQ#issuecomment-541101362, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADO7ILUIWVXIO76VPKXU3JTQOCIQLANCNFSM4FBRM57Q .
Amy, OpenMPI normally also works on our HPC as well; perhaps there is a compatibility problem for some reason. It also starts running on multiple threads for me but then runs into error eventually. edit-actually I'm not sure if it ever uses multiple threads or if it just sets that at the beginning; I don't remember for sure if I checked that.. Best
Fatih, Do you have any additional information on how/when OpenMPI is being used by G-PhoCS, based on your earlier troubleshoot? Our HPC specialist told me that the program isn't actually calling OpenMPI, but rather OpenMP (I'm using a single node, with 36 threads, so my understanding is that OpenMP is being used instead of OpenMPI). He said he didn't see any calls to OpenMPI in the G-PhoCS code. We're running now with a single core to troubleshoot, but it will be some time before it finishes running. Thanks again!
Amy, You are right, I confused them two, sorry. When I do this: echo | cpp -fopenmp -dM | grep -i openmp I get this:
I also activated another version and got to receive this
but in both cases my runs on multiple threads failed, but on single thread succeeded Perhaps the program starts using multiple threads after some point of the analysis and that point gives segmentation fault? Best wishes..
Amy, Fatih,
To be honest, I don't know for sure what might be causing these segmentation faults. I can't seem to reproduce them on any of my machines. If you're able to pinpoint the cause, we can try to fix the underlying cause. I do have a few comments that you may find helpful: 1) G-PhoCS is using OpenMP for threading, as you mention in your threads. 2) Since the segmentation fault appears to occur close to the end of the run, you should still be able to get a usable trace file out of it. It could very well be that only the last line is corrupt (which is why you cannot open it with Tracer). I suggest examining the trace file as text and looking for possible issues (just do head trace.txt and tail trace.txt). you should still be able to get usable results even if you encountered a segmentation faults toward the end of the run.
Hi Fatih and Ilan,
Thank you both for these responses, they have been helpful in guiding our next steps.
Can you tell us which compiler you used? We originally tried gcc, and are now testing the intel compiler, to see if the issue has anything to do with how the program is compiled.
As a test, I ran the three versions (single thread, multi-thread gcc, and multi-thread intel) on the sample data provided with the program. The single thread run was successful, and the multi-thread gcc-compiled run produced the same error I got using my own data. The multi-thread intel-compiled run produced a new error, and I am waiting to see if the same error is produced on my data which is also currently running on the intel-compiled version of the program.
Interestingly, I wasn't able to open the results from any of the test runs in Tracer, from either the sample input data or my own input data. The format of all files looks exactly the same, with no obvious corruption in the head or the tail of the multi-thread results. The only difference I notice is that the end values seem very different between runs. I tried deleting the last several lines, as you suggested, but this did not seem to change anything.
Fatih, have you been able to open the results of your single-thread run in Tracer?
Thank you, Amy
Amy,
I personally use gcc, and this is the default compiler defined in the Makefile. I think that the Intel compiler should also be fine, but I didn't test this with the latest version. I also cannot figure out any likely cause for the issues you're getting with Tracer. I'd gladly have a look if you send me via e-mail one of the trace files that you cannot open (the smallest one, if you can)? Send to ilan.gronau@idc.ac.il
Amy, Yes, I could open my Gphocs MCMC results on Tracer (on windows); I just had to give my file a ".log" extension; otherwise the program don't see the file -but I guess your problem is not that but the tail of the file?" Best wishes
By the way Ilan, a useful note maybe for the future possible error messages from other users; my complete file with 10KB 237,290 loci with 6 samples finished successfully with single thread Gphocs. It took about 5 days with 5,000 MCMC iterations, so a good run would take several months probably, but the 14GB file size or locus length did not cause a problem. Best
Hi again, just following up for any others who come across this error. We discovered that I was running into two separate errors: 1) the segmentation fault, which caused the program to fail just before completion, and 2) issues with opening trace files on the newest version of Tracer (v1.7.1). Regarding issue #1, enough runs are completed so that the trace file is usable even after the segmentation fault. Using a dataset of ~18 loci for 5 individuals, the program completed 999,999 MCMC iterations in 4 days and 9 hours before failing with a segmentation error. Regarding issue #2, the trace files open on Tracer v1.6 and earlier, so using one of these versions is preferred.
Since the segmentation fault has only been generated for people using HPCs, our specialist suggested that one solution would be to reproduce the quasi-exact environment that the program was written in, including: OS [Kernel version, Glibc version], Compiler [Compiler version], and Cpu Model.
Ilan, if you are interested in pursuing this issue further and have the above information, I'd be happy to test this potential fix.
Thanks for the detailed explanations. I'm sure it will prove useful to future users.
1) We will try to figure out what's causing the issues with loading the trace file to new versions of Tracer. We'd like to stay compatible with the most updated version to help our users analyze their traces.
2) Regarding the segmentation fault, this is still a big mystery to us. Ideally, we would like to find the part in the code that triggers this issue and fix it. I don't quite understand the suggestion provided by your HPC specialist. I can specify the Glibc version and compiler version, but I suspect that these are not the causes. The OS could be an issue, but how would it help to specify it? Users would like to run the software on their own OS. Running it on a VM seems excessive to address this minor segfault issue. I would gladly follow up on this if you can elaborate more on the suggested solution.
When I run G-PhoCS, for 1 locus with my entire dataset of alligned SNPs I get the following error:
Is there any problem for using "one locus" in that way? My problem is that I have lost my .loci file during variant calling process... Thank you for your help. Alex.