owenjm / find_peaks

A simple FDR peak caller
5 stars 4 forks source link

Post-processing of peaks... #1

Open ademcan opened 7 years ago

ademcan commented 7 years ago

Dear Owen, This time I have a question about the peaks2genes script. I was wondering about the peak values (second column) of the output .csv files. How one should implement these values? It seems that there are quite a lot of genes in my output files, can I use an additional threshold to distinguish them, if yes, how should I proceed? Any help will be very welcome as I have never done any peak analysis before. I was also wondering if your perl scripts are a "continuity" of the one used in the following study http://onlinelibrary.wiley.com/doi/10.1038/emboj.2009.309/full ? Thank you very much for your help.

owenjm commented 7 years ago

Hi Adem, sorry for the delay in getting back to you. Peak values are the maximum score under the peak (which seems a better empirical measure than the average score, but neither are really perfect here); they may give some measure of how strongly associated the Dam-fusion protein is at that locus, but there are of course many other factors that can influence this value. I'd use them with caution.

Associating genes with peaks is not a precise art. Neither the "closest" gene, or all genes within a certain proximity are particularly good indicators of whether those genes are regulated by a TF binding event. Many enhancers are located significant distances away from the genes that they regulate, and may lie within the introns of other genes that are not regulated by these factors. Binding events at a gene promoter is probably useful (and you can look for these events with the --promotors_only and the --gene_pad options), but you'll miss any enhancers with such an analysis.

My perl scripts are a rewrite based on the algorithm described in that paper. The code is new, but the basic algorithm ideas are the same.

Hope this helps, Owen

ademcan commented 7 years ago

Hi Owen, no problem, I really appreciate your help. Thank you very much for the details, I will give a try with the different parameters that you mentioned, at least they will be helpful and by combining all the information (in addition to binding sites) I guess we can have a better overview of what is happening.