rajanil / msCentipede

A hierarchical multiscale model for inferring transcription factor binding from chromatin accessibility data.
MIT License
25 stars 6 forks source link

Error with --protocol ATAC_seq #12

Open vschulz opened 8 years ago

vschulz commented 8 years ago

Hi,

I am trying to use msCentipede on a ATAC-seq time course series without success. I get an error when running a command like

python call_binding.py --task learn --protocol ATAC_seq testbk/Ctcf_chr10_motifs.txt.gz testbk/wgEncodeUwDnaseGm12878AlnRep1.bam loading motifs ... num of motif sites = 471 loading read counts ... transforming data into multiscale representation ... starting model estimation (restart 1) Traceback (most recent call last): File "call_binding.py", line 344, in main() File "call_binding.py", line 337, in main learn_model(options) File "call_binding.py", line 103, in learn_model background_counts, options.model, options.log_file, options.restarts, options.mintol) File "mscentipede.pyx", line 1317, in mscentipede.estimate_optimal_model (mscentipede.c:25176) pi.value[j][~mask] = pi.value[j][mask].min() File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py", line 29, in _amin return umr_minimum(a, axis, None, out, keepdims) ValueError: zero-size array to reduction operation minimum which has no identity

whereas the equivalent --protocol DNase_seq command runs OK. This also happens on actual atac-seq data.

I also have other questions:

  1. From the PIQ website, they suggest training on a combined bam file of all reads for all conditions. Is this good for msCentipede?
  2. How is the pickle file that was created in learning step used in inferring? I don't see a parameter to modify.
  3. Is there a preferred tool/way to make a motif set? I used commands like: cat genome_mask.fa | fimo --motif GATA1_si --o GATA1_si --max-stored-scores 5000000 --thresh 1e-3 HOCOMOCOv9.meme -

    convert 0 based

    sed '1d' fimo.txt | awk 'BEGIN {OFS="\t"}; {print $2,$3-1,$4,$5,$6}' | cat ../motif_header.txt - | gzip > msc_fimo.txt.gz

  4. Do I need to modify the bam file for 4 and 5 base offset of plus and minus strand for ATAC transposase binding, or is this accommodated in the code?

Thanks,

Vince

robertamezquita commented 8 years ago

Bump, I have the exact same issue running msCentipede.

sunfer commented 6 years ago

Hey!

Did you guys manage to find a solution to this?

Thanks,

Jude

vschulz commented 6 years ago

Not yet.

It may be good to try out DeFCoM that should work well with ATAC-seq https://www.ncbi.nlm.nih.gov/pubmed/27993786