jpiper / pyDNase

Python module for the easy handling and analysis of DNase-seq data
http://jpiper.github.io/pyDNase
MIT License
37 stars 24 forks source link

wellington_bootstrap questions #14

Open jean997 opened 8 years ago

jean997 commented 8 years ago

Hi there, This is probably a very simple question but I wanted to make sure that I was correctly understanding how to use the wellington_bootstrap.py script.

The script appears to only take two bam files "treatment_bam" and "control_bam". Is the expectation that if I have more than one sample in each group I will merge the bam files? It would be great if there was a way to pass multiple files for each group since the merged files can get very large! Thanks! Jean

jpiper commented 8 years ago

Ah, I might be able to modify the scripts to take arrays of BAM files, so that you can provide multiple treatments and controls.

I've noticed that someone has made a branch that does this here - https://github.com/PanosFirmpas/pyDNase/ for wellington_footprints.py, it hopefully shouldn't be too complicated to add support for wellington_bootstrap on top of this.

In the meantime, yes, you'd need to merge the BAM files. You can do this quickly using the samtools cat command provided all the BAM files share the same identical sequence dictionary

samtools cat [-h header.sam] [-o out.bam] <in1.bam> <in2.bam> [ ... ]

To generate the header.sam you can just use samtools view -H <in1.bam> > header.sam

jean997 commented 8 years ago

Thanks - that's great! I was also wondering a few more things

  1. Is there a way to interpret Wellington bootstrap scores in terms of p-values or false discovery rates? How do you choose a score cutoff. I know in the paper a score of 10 was chosen but I'm not exactly sure how this threshold was arrived at.
  2. In the output of the footprint files I notice that there is a column (the 6th) that is entirely '+' symbols. Does this column mean something? Thanks! Jean

On Thu, Jun 30, 2016 at 8:42 PM, Jason Piper notifications@github.com wrote:

Ah, I might be able to modify the scripts to take arrays of BAM files, so that you can provide multiple treatments and controls.

I've noticed that someone has made a branch that does this here - https://github.com/PanosFirmpas/pyDNase/ for wellington_footprints.py, it hopefully shouldn't be too complicated to add support for wellington_bootstrap on top of this.

In the meantime, yes, you'd need to merge the BAM files. You can do this quickly using the samtools cat command provided all the BAM files share the same identical sequence dictionary

samtools cat [-h header.sam] [-o out.bam] [ ... ]

To generate the header.sam you can just use samtools view -H > header.sam

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jpiper/pyDNase/issues/14#issuecomment-229847097, or mute the thread https://github.com/notifications/unsubscribe/AK4VMN7_rD71Q-vQE52Uy3kJ0723IQ66ks5qRIyZgaJpZM4JASHF .

jpiper commented 6 years ago

1) Eurgh, it's been so long since I wrote that paper I need to have get into the right headspace and remember how the algorithm works, ha!

2) The is part of the BED specification, you can ignore these!