popgenmethods / smcpp

SMC++ infers population history from whole-genome sequence data.
GNU General Public License v3.0
149 stars 34 forks source link

masking problem when I use smc++ cv #145

Open Huyuxi08 opened 4 years ago

Huyuxi08 commented 4 years ago

Hi, dear @terhorst ! When I run vcf2smc with the --mask flag , it completes successfully. But when I run smc++ cv , it returns some error information:

--- Logging error --- Traceback (most recent call last): File "/home/software/smc++/smcpp/lib/python3.6/logging/init.py", line 994, in emit msg = self.format(record) File "/home/software/smc++/smcpp/lib/python3.6/logging/init.py", line 840, in format return fmt.format(record) File "/home/software/smc++/smcpp/lib/python3.6/logging/init.py", line 577, in format record.message = record.getMessage() File "/home/software/smc++/smcpp/lib/python3.6/logging/init.py", line 338, in getMessage msg = msg % self.args File "/home/software/smc++/smcpp/lib/python3.6/site-packages/numpy/ma/core.py", line 4312, in int raise MaskError('Cannot convert masked element to a Python int.') numpy.ma.core.MaskError: Cannot convert masked element to a Python int. Call stack: File "/home/software/smc++/smcpp/bin//smc++", line 11, in load_entry_point('smcpp==1.15.2', 'console_scripts', 'smc++')() File "/home/software/smc++/smcpp/lib/python3.6/site-packages/smcpp/frontend/console.py", line 26, in main cmds[args.command].main(args) File "/home/software/smc++/smcpp/lib/python3.6/site-packages/smcpp/commands/cv.py", line 80, in main [args.data[k] for k in range(L) if k not in fold], args File "/home/software/smc++/smcpp/lib/python3.6/site-packages/smcpp/analysis/analysis.py", line 23, in init if self.npop != 1: File "/home/software/smc++/smcpp/lib/python3.6/site-packages/smcpp/analysis/base.py", line 183, in npop return len(self.populations) File "/home/software/smc++/smcpp/lib/python3.6/site-packages/smcpp/analysis/base.py", line 71, in populations return self._pipeline["load_data"].populations File "/home/software/smc++/smcpp/lib/python3.6/site-packages/smcpp/data_filter.py", line 25, in getitem self.run() File "/home/software/smc++/smcpp/lib/python3.6/site-packages/smcpp/data_filter.py", line 45, in run self._results = f(self._results) File "/home/software/smc++/smcpp/lib/python3.6/site-packages/smcpp/data_filter.py", line 57, in call return self.run(contigs) File "/home/software/smc++/smcpp/lib/python3.6/site-packages/smcpp/data_filter.py", line 229, in run "mutation counts in %dbp windows: min=%d .05=%d .95=%d max=%d", self.w, *res Message: 'mutation counts in %dbp windows: min=%d .05=%d .95=%d max=%d' Arguments: (142857, masked, masked, masked, masked)

It looks like something went wrong, but I don’t quite understand. I am really appreciate if you would give me some help, thanks in advance!

THccaa commented 4 years ago

Hello,

@cyril698 you are using python 3.6. You may want to update to 3.7. In the description it says you are supposed to use python 3.7 or greater.

However, I get the same error with python 3.7 and I use the -c flag in vcf2smc. I tried both 'estimate' and 'cv', but it throws the same error.

--- Logging error --- Traceback (most recent call last): File "/home/user/.conda/envs/smcpp/lib/python3.7/logging/init.py", line 1025, in emit msg = self.format(record) File "/home/user/.conda/envs/smcpp/lib/python3.7/logging/init.py", line 869, in format return fmt.format(record) File "/home/user/.conda/envs/smcpp/lib/python3.7/logging/init.py", line 608, in format record.message = record.getMessage() File "/home/user/.conda/envs/smcpp/lib/python3.7/logging/init.py", line 369, in getMessage msg = msg % self.args File "/home/user/.conda/envs/smcpp/lib/python3.7/site-packages/numpy/ma/core.py", line 4344, in int raise MaskError('Cannot convert masked element to a Python int.') numpy.ma.core.MaskError: Cannot convert masked element to a Python int. Call stack: File "/home/user/.conda/envs/smcpp/bin/smc++", line 8, in sys.exit(main()) File "/home/user/.conda/envs/smcpp/lib/python3.7/site-packages/smcpp/frontend/console.py", line 26, in main cmds[args.command].main(args) File "/home/user/.conda/envs/smcpp/lib/python3.7/site-packages/smcpp/commands/cv.py", line 75, in main test = Analysis([args.data[j] for j in range(L) if j in fold], args) File "/home/user/.conda/envs/smcpp/lib/python3.7/site-packages/smcpp/analysis/analysis.py", line 23, in init if self.npop != 1: File "/home/user/.conda/envs/smcpp/lib/python3.7/site-packages/smcpp/analysis/base.py", line 183, in npop return len(self.populations) File "/home/user/.conda/envs/smcpp/lib/python3.7/site-packages/smcpp/analysis/base.py", line 71, in populations return self._pipeline["load_data"].populations File "/home/user/.conda/envs/smcpp/lib/python3.7/site-packages/smcpp/data_filter.py", line 33, in getitem self.run() File "/home/user/conda/envs/smcpp/lib/python3.7/site-packages/smcpp/data_filter.py", line 53, in run self._results = f(self._results) File "/home/user/.conda/envs/smcpp/lib/python3.7/site-packages/smcpp/data_filter.py", line 22, in call return self.run(contigs) File "/home/user/.conda/envs/smcpp/lib/python3.7/site-packages/smcpp/data_filter.py", line 229, in run "mutation counts in %dbp windows: min=%d .05=%d .95=%d max=%d", self.w, *res Message: 'mutation counts in %dbp windows: min=%d .05=%d .95=%d max=%d' Arguments: (500000, masked, masked, masked, masked)

Afterwards the program finishes normally, but I get a model.best.json in a fold0 directory (although I use --folds 2). The plot shows a strong bottleneck which fits to the demography that I observed with stairwayplots, but the time frame is completely off, it starts 1 billion years ago. My genus is considered to be 4mio years old.

Huyuxi08 commented 4 years ago

Hi,

@th-al , sorry, I just saw your reply. Thanks a lot, I will try to use python3.7. And you said that you had tried both 'estimate' and 'cv', but got the same error message. However, when I tried 'cv', it completes successfully.

pstokespmb commented 4 years ago

Hi all,

Were you able to get estimate to run? I am running into this same error. I suspect it has to do with too many sites being masked. What have you found?

THccaa commented 4 years ago

I also suspect that the number of masked sites may cause the problem. I have ~4k SNPs distributed on a 5Gbp genome. This obviously creates a huge amount of masked sites and, as noted in the program manual, this is a rather crude way to define homozygous sites. Therefore, I could imagine that the number of masked sites may cause a problem, but I do not have the computer skills to investigate the error. Anyway, after @cyril698 wrote that 'estimate' worked for him, I tried again with 'cv' and 'estimate' and played around with the 'mask' parameter, but the error is always the same.

Giov12 commented 2 years ago

I am also having my mask file represent the majority of sites in the genome I am currently working with and am getting the same error message. Has there been any updates on a solution? Thank you.

RGoess commented 1 year ago

I have the same error message. The run does complete and gives results but I'm not sure how reliable it is with the error..

SimonaSecomandi commented 1 year ago

Hi all, I'm getting the same error using smc++ cv, does anyone has a solution? If I don't mask my data, the error does not appear.