tjparnell / biotoolbox

Tools for querying and analysis of genomic data
http://tjparnell.github.io/biotoolbox/
Artistic License 2.0
27 stars 16 forks source link

Error while using for normalizing ChIP data. #14

Closed XXXXXuan closed 3 years ago

XXXXXuan commented 3 years ago

The example you showed had --method rpm, but when I used this parameter for normalizing the data. It did't work.

tjparnell commented 3 years ago

I think you are referring to an example in the get_datasets.pl script documentation. I have honestly forgotten about this and consequently they haven't been updated in over five years, despite numerous released versions and changes in options. Thank you for bringing this to my attention. I have updated the script documentation.

Use the option --method ncount --fpkm region to collect depth-normalized counts. This counts alignment names and normalizes based on the total counted. If you want to normalize based on all alignments in the bam file, use --fpkm genome, but realize that this is computationally expensive as it counts the entire bam file (!!!). You will likely get much better performance (and better alignment filtering and reusability) by first generating a count bigWig file using bam2wig. As a very simple example:

bam2wig.pl --start --rpm --bw --in file.bam --out counts.bw

You can then use the resulting bigWig file with get_datasets repeatedly with considerably better performance. You will want to change the parameters. As a simple example:

get_datasets.pl --in peaks.bed --method sum --format 0 counts.bw

This will sum the number of counts and round to the nearest integer.

XXXXXuan commented 3 years ago

Thank you very much!!! The reason I used this is that I was recurrent a paper published in 2019. The author used data of different depth, so I need to normalize the different data set. But I didn't know the code is from 5 years ago! That's amazing. And honestly, thanks for your reply!

tjparnell commented 3 years ago

The example in the documentation is at least 5 years old, but the project dates back to 2010, with roots even further back. I'm glad you're finding it useful.