jsh58 / Genrich

Detecting sites of genomic enrichment
MIT License
182 stars 27 forks source link

BedGraph-ish with replicates to BigWig #89

Closed Simso86 closed 2 years ago

Simso86 commented 2 years ago

Hi John,

I am currently analysing ATAC-seq data and performing peak calling with Genrich. I am doing this for three replicates and output bedgraph-ish files with -k. Now I have been struggling to convert these bedgraph-ish files to BigWig for visualization purposes. I removed the comment and header lines, and extracted the first four columns. Next, I use bedGraphToBigWig for the conversion. However, there are overlapping regions and I am not sure what would be the best way to deal with those.

The following is the top 20 lines of my file (after removal of comment/header lines, extracting first four columns and sorting):

chr1 0 792500 0.000000 chr1 0 792500 0.000000 chr1 0 792500 0.000000 chr1 792500 792505 2.000000 chr1 792500 792593 1.000000 chr1 792500 794105 0.000000 chr1 792505 792561 1.000000 chr1 792561 792887 0.000000 chr1 792593 792785 0.000000 chr1 792785 792859 1.000000 chr1 792859 792885 2.000000 chr1 792885 792900 1.000000 chr1 792887 792946 2.000000 chr1 792900 792955 2.000000 chr1 792946 792987 3.000000 chr1 792955 792959 3.000000 chr1 792959 793000 2.000000 chr1 792987 793046 1.000000 chr1 793000 793029 1.000000 chr1 793029 793055 2.000000

I am aware that this question does not relate to the tool itself but the output file handling, and therefore may be more suited as a question for a forum such as StackOverflow. However, because this bedgraph-ish file format is specific for Genrich I decided to turn directly to you as the developer.

Thank you for a very nice and useful tool. Best, Simon

jsh58 commented 2 years ago

Hi Simon,

Thank you for the question. This definitely belongs here.

It seems that you want to visualize the experimental pileup values for the three replicates. The issue is that you are combining these values for the replicates. The description for the -k <file> states:

For each replicate, sequentially, this file lists a header line (# experimental file: <name>; control file: <name>), followed by experimental/control pileups and a p-value for each interval.

Therefore, you should split the -k file into three separate outputs, one for each replicate. To accomplish this, I would recommend determining where the header lines are (for example, via grep -n '^#'), and then splitting the file using head and/or tail.

Good luck,

John Gaspar

Simso86 commented 2 years ago

Thank you for your answer. I also have another question to bring up for discussion, but I will open a new issue for that one.