rajanil / msCentipede

A hierarchical multiscale model for inferring transcription factor binding from chromatin accessibility data.
MIT License
25 stars 6 forks source link

error: cvxopt and mscentipede.pyx #8

Open ghost opened 8 years ago

ghost commented 8 years ago

hello, When running the default model and the model using purified genomic dna (default and --model msCentipede_flexbg) the tool throws 2 different errors during the learning phase of the footprint calling:

FLEXBG MODEL: ... learning a flexible background model ... Nan in Tau Error: File "mscentipede.pyx", line 410, in mscentipede.Tau.update (mscentipede.c:12756) raise ValueError ValueError

DEFAULT MODEL: ... starting model estimation (restart 1) initial log likelihood = -6.32e+01 Error: File "dir/.conda/envs/msCentipede/lib/python2.7/site-packages/cvxopt/cvxprog.py", line 788, in cpl raise ValueError("Rank(A) < p or "\ ValueError: Rank(A) < p or Rank([H(x); A; Df(x); G]) < n

I tried this with 2 different read sets. Please let me know if more info is needed thanks george

yuifu commented 8 years ago

I also have the same issue. I run msCentipede on different motif files, and some motif files resulted in the following error.

standard error:

Traceback (most recent call last):
  File "../../bin/proj/msCentipede/call_binding.py", line 344, in <module>
    main()
  File "../../bin/proj/msCentipede/call_binding.py", line 337, in main
    learn_model(options)
  File "../../bin/proj/msCentipede/call_binding.py", line 103, in learn_model
    background_counts, options.model, options.log_file, options.restarts, options.mintol)
  File "mscentipede.pyx", line 1337, in mscentipede.estimate_optimal_model (mscentipede.c:31882)
    square_EM(data, scores, zeta, pi, tau, \
  File "mscentipede.pyx", line 1134, in mscentipede.square_EM (mscentipede.c:28300)
    EM(data, scores, zeta, pi, tau, alpha, beta, omega, pi_null, tau_null, model)
  File "mscentipede.pyx", line 1071, in mscentipede.EM (mscentipede.c:27903)
    beta.update(scores, zeta)
  File "mscentipede.pyx", line 721, in mscentipede.Beta.update (mscentipede.c:21384)
    self.estim = optimizer(xo, beta_function_gradient, beta_function_gradient_hessian, args)
  File "mscentipede.pyx", line 849, in mscentipede.optimizer (mscentipede.c:25037)
    solution = solvers.cp(F)
  File "build/bdist.linux-x86_64/egg/cvxopt/cvxprog.py", line 1966, in cp
  File "build/bdist.linux-x86_64/egg/cvxopt/cvxprog.py", line 782, in cpl
ValueError: Rank(A) < p or Rank([H(x); A; Df(x); G]) < n

Environment: Python 2.7.10, cvxopt-1.1.8

standard out:

loading motifs ... 
num of motif sites = 10000
loading read counts ... 
transforming data into multiscale representation ...
starting model estimation (restart 1)
initial log likelihood = -3.09e+01
ghost commented 8 years ago

There is an example motif file given in the github repository. could it be possible for someone to post some reads that are known to work with the example motif file so that I can get the tool to at least run to completion? thanks

yuifu commented 8 years ago

gsteinhardt

You can download ENCODE DNase data (.bam) from here: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeUwDnase/

I run msCentipede on wgEncodeUwDnaseGm12878AlnRep1.bam, and it run to completion.

ghost commented 8 years ago

thanks yuifu, i was able to get the tool to run to completion with those reads and the motif file in the repository.

mlendale commented 8 years ago

Dear all, Has someone figured out the reason for the ValueError: Rank(A) < p or Rank([H(x); A; Df(x); G]) < n error ? I was able to run the tool with example files but not with any other input files. Thanks

rajanil commented 8 years ago

Hi @mlendale @yuifu @gsteinhardt I'm working on this issue currently, sorry for the delayed response. It seems to happen for some motif sets, but I'm not sure why yet. I will keep you posted. thanks! -anil

OlaCabaj commented 8 years ago

@rajanil Anything new on this error front? I have the exact same problem as users above. Thanks!

Edit: I think the problem lies within motif PWM scores. In some databases it is given as a probability (I think), because values range from 0 to 1. That was the case with my dataset and after changing them to -log10 the script works.