popgenmethods / smcpp

SMC++ infers population history from whole-genome sequence data.
GNU General Public License v3.0
149 stars 33 forks source link

smc++ split command issue #195

Open pstokespmb opened 2 years ago

pstokespmb commented 2 years ago

Hi @terhorst,

I am once again asking for your assistance :p. I can successfully complete all steps up to "split" but get an error when giving it a shot. I emailed you a google drive link with the smallest amount of input to generate the error (too large for GitHub to upload). I have also attached the error log with -verbose , hopefully this gives you enough info :).

I look forward to your reply.

The error:

3548 smcpp.commands.command DEBUG ['/global/home/groups/consultsw/sl-7.x86_64/modules/smcpp/1.15.3/bin/smc++', 'split', '-v', '-o', '/global/scratch2/peter_stokes/DemographicInference/Modeling/smcpp/split/output/pooledOuts', '/global/scratch2/peter_stokes/DemographicInference/Modeling/smcpp/split/output/pooledOuts/Wild_Knots.final.json', '/global/scratch2/peter_stokes/DemographicInference/Modeling/smcpp/split/output/pooledOuts/smcPP_Estimate_Landrace_knots.final.json', 'Wild_Landrace_Chr01.smc.gz', 'Landrace_Wild_Chr01.smc.gz'] 3548 smcpp.commands.command DEBUG Namespace(Nmax=1000.0, Nmin=0.001, algorithm='L-BFGS-B', base='model', command='split', cores=None, data=['Wild_Landrace_Chr01.smc.gz', 'Landrace_Wild_Chr01.smc.gz'], emiterations=20, ftol=0.0001, lambda=None, length_cutoff=None, multi=False, no_initialize=False, nonseg_cutoff=None, outdir='/global/scratch2/peter_stokes/DemographicInference/Modeling/smcpp/split/output/pooledOuts', polarization_error=0.5, pop1='/global/scratch2/peter_stokes/DemographicInference/Modeling/smcpp/split/output/pooledOuts/Wild_Knots.final.json', pop2='/global/scratch2/peter_stokes/DemographicInference/Modeling/smcpp/split/output/pooledOuts/smcPP_Estimate_Landrace_knots.final.json', regularization_penalty=6, seed=0, thinning=None, timepoints=None, unfold=False, verbose=1, w=100, xtol=0.1) 3550 smcpp.analysis.base INFO theta: 0.000100 3550 smcpp.analysis.base INFO rho: 0.000100 3550 smcpp.analysis.base DEBUG Polarization error p=0.500000 3550 smcpp.data_filter DEBUG LoadData() 3551 smcpp.data_filter INFO Loading data... concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/global/software/sl-7.x86_64/modules/langs/python/3.7/lib/python3.7/concurrent/futures/process.py", line 205, in _sendback_result exception=exception)) File "/global/software/sl-7.x86_64/modules/langs/python/3.7/lib/python3.7/multiprocessing/queues.py", line 364, in put self._writer.send_bytes(obj) File "/global/software/sl-7.x86_64/modules/langs/python/3.7/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/global/software/sl-7.x86_64/modules/langs/python/3.7/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/global/home/groups/consultsw/sl-7.x86_64/modules/smcpp/1.15.3/bin/smc++", line 10, in sys.exit(main()) File "/global/home/groups/consultsw/sl-7.x86_64/modules/smcpp/1.15.3/lib/python3.7/site-packages/smcpp/frontend/console.py", line 26, in main cmds[args.command].main(args) File "/global/home/groups/consultsw/sl-7.x86_64/modules/smcpp/1.15.3/lib/python3.7/site-packages/smcpp/commands/split.py", line 44, in main analysis = SplitAnalysis(args.data, args) File "/global/home/groups/consultsw/sl-7.x86_64/modules/smcpp/1.15.3/lib/python3.7/site-packages/smcpp/analysis/split.py", line 19, in init assert self.npop == 2 File "/global/home/groups/consultsw/sl-7.x86_64/modules/smcpp/1.15.3/lib/python3.7/site-packages/smcpp/analysis/base.py", line 183, in npop return len(self.populations) File "/global/home/groups/consultsw/sl-7.x86_64/modules/smcpp/1.15.3/lib/python3.7/site-packages/smcpp/analysis/base.py", line 71, in populations return self._pipeline["load_data"].populations File "/global/home/groups/consultsw/sl-7.x86_64/modules/smcpp/1.15.3/lib/python3.7/site-packages/smcpp/data_filter.py", line 33, in getitem self.run() File "/global/home/groups/consultsw/sl-7.x86_64/modules/smcpp/1.15.3/lib/python3.7/site-packages/smcpp/data_filter.py", line 53, in run self._results = f(self._results) File "/global/home/groups/consultsw/sl-7.x86_64/modules/smcpp/1.15.3/lib/python3.7/site-packages/smcpp/data_filter.py", line 22, in call return self.run(contigs) File "/global/home/groups/consultsw/sl-7.x86_64/modules/smcpp/1.15.3/lib/python3.7/site-packages/smcpp/data_filter.py", line 97, in run contigs = estimation_tools.load_data(files) File "/global/home/groups/consultsw/sl-7.x86_64/modules/smcpp/1.15.3/lib/python3.7/site-packages/smcpp/estimation_tools.py", line 281, in load_data obs = list(p.map(_load_data_helper, files)) File "/global/software/sl-7.x86_64/modules/langs/python/3.7/lib/python3.7/concurrent/futures/process.py", line 483, in _chain_from_iterable_of_lists for element in iterable: File "/global/software/sl-7.x86_64/modules/langs/python/3.7/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator yield fs.pop().result() File "/global/software/sl-7.x86_64/modules/langs/python/3.7/lib/python3.7/concurrent/futures/_base.py", line 435, in result return self.__get_result() File "/global/software/sl-7.x86_64/modules/langs/python/3.7/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result raise self._exception struct.error: 'i' format requires -2147483648 <= number <= 2147483647

terhorst commented 2 years ago

Hi Peter,

Apologies, I have very little time to work on SMC++ these days. I've not seen this error before. As best I can tell from this Stack Overflow thread it's related to Conda and/or memory exhaustion. But usually when it runs out of memory, a more obvious memory-related exception is thrown. To eliminate Conda as the source of the bug, could you try re-running the command using the Docker image? Regards,

Jonathan

pstokespmb commented 2 years ago

Hey @terhorst,

Thanks for taking the time to get a reply to me despite your busy schedule!

I was able to successfully get smc++ running through singularity and docker. It is running without errors, but will confirm when the job finishes.

Since you seem to be peeking at GitHub/smc++ recently, I would like to ask a couple questions that have been asked here several times, but still seems to come up.

When it comes to split, there seems to be some confusion about input files for split. I will run through some pseudo steps below.

For two populations:

1.) vcf2smc with -d and masking for both populations individually 2.) estimate for each population individually 3.) vcf2smc with -d and masking for pop1pop2 and pop2pop1 4.) split

Here are my questions:

Q1.) When I am performing the 2DSFS/JSFS (step 3 above), should I specify distinguished individuals and provide a mask like we would normally do for 1DSFS (step 1 above)?

Q2.) When I am specifying input for split (step 4 above), do the .smc.gz files from step 1 AND the .smc.gz files from step 3 BOTH go into the split command?

Thanks again for all your help! I really appreciate your time :)

pstokespmb commented 2 years ago

Hey @terhorst

Please see attached .err file containing verbose debug info. The job started through Singularity/Docker just fine, but I encountered a fail that I am unable to interpret. Please keep in mind this was not a run using all available data, so maybe that is why it failed? In any case, the error output is attached :). Thanks for all your help!

smcpp_split_test.txt

terhorst commented 2 years ago

Hi,

The command you provided for split in your log file looks correct. I am not sure what the source of the error you are getting is. It appears to be thrown during a procedure that bins the data into windows. I have a feeling it may be related to the other error you posted... something about the input data. Just out of curiosity, what happens when you manually set the -w option in split? Say -w 100?