jsh58 / Genrich

Detecting sites of genomic enrichment
MIT License
182 stars 27 forks source link

Feature Request: new option -i and -I to complement -e and -E #40

Open malcook opened 4 years ago

malcook commented 4 years ago

During development of a pipeline involving Genrich for integrating ATAC seq with ChIP-Seq for multiple marks, I wish to only call peaks on a few small regions. For this reason, it is desirable to be able to specify which chromosomes or bed-regions to include.

The effective genome should then be the regions to include minus the regions to exclude.

This would allow me to tell Genrich to analyze, eg, chr8 only, minus any pre-computed global region black-list.

Finally, being able to specify chromosome to include or exclude using regular expression would be great. One useful expression would be `-i ^chr\d+$' to effectively remove (in the case of exnsembl zebrafish) chrM and an of the "unknown" chromosomal fragments matching "chrUn_*".

This feature would also simplify life for people seeking an easier way to #29.

jsh58 commented 4 years ago

Thanks for the suggestion. The reason why Genrich analyzes the whole genome by default, is because that is how these assays work. ATAC-seq, ChIP-seq, etc. are performed on whole genomes, not just certain chromosomes or regions.

Nevertheless, I will consider the request. In the meantime, please use -e and -E, and let me know if there are any issues with them.

malcook commented 4 years ago

Thanks for the consideration. It is really a convenience that allows me to trial run an analysis on a fraction of the genome in the interest of debugging a larger workflow on a limited set of data. I am able to use -e effectively for this purpose to exclude all but one chromosome.

Thanks for Genrich!

~ malcolm_cook@stowers.org

ScottNortonPhD commented 4 years ago

As a workaround, you can select the regions you want using bedtools intersect.

jsh58 commented 4 years ago

bedtools intersect is unlikely to produce the correct result in this context.

j-andrews7 commented 2 years ago

A parameter to provide genome length directly would also be very helpful. We subset data frequently to run multiple different peak callers with various parameters to find the best settings for a given assay.

jsh58 commented 1 year ago

There is now a -L <int> CL argument that can be used to set the genome length directly.