hputnam / Meth_Compare

6 stars 3 forks source link

Create bed files of different method-associated loci coverage #61

Closed sr320 closed 4 years ago

sr320 commented 4 years ago

This is in relation to "upset" plot discussion today.

Approach / downstream wise I would suggest taking the union bedgraph and simply parsing based on approaches covering loci, recalling that in our conversation we stated only one sample per method would count.

Resulting bed files would then easily go to Yaamini for feature analysis.

should be ~ 8 files?

MBD - WGBS - RRBS MBD - WGBS MBD - RRBS WGBS - RRBS WGBS only MBD only RRBS only None

shellywanamaker commented 4 years ago

@yaaminiv here are the bed files:

Mcap:

Pact:

you could also run the following code to download them if that's easier:

wget -r \
--no-directories --no-parent --reject "index.html*" \
-P . \
-A CpGs.bed https://gannet.fish.washington.edu/metacarcinus/FROGER_meth_compare/20200504/data/Mcap/

wget -r \
--no-directories --no-parent --reject "index.html*" \
-P . \
-A CpGs.bed https://gannet.fish.washington.edu/metacarcinus/FROGER_meth_compare/20200504/data/Pact/

The columns are the same as the union bed file so there are more than 4 columns, but I don't think it should affect bedtools intersect because the first 3 columns are still:

  1. Scaffold
  2. Start
  3. End
  4. ALL (1 indicates CpG is in the genome)
  5. WGBS (1 indicates CpG was detected by WGBS)
  6. RRBS (1 indicates CpG was detected by RRBS)
  7. MBD (1 indicates CpG was detected by MBD)

jupyter notebook used to create files is here: https://github.com/hputnam/Meth_Compare/blob/master/scripts/Generate_UpsetPlot_input.ipynb

shellywanamaker commented 4 years ago

@yaaminiv NOTE: these files do not contain redundant CpGs. In other words, if a CpG is listed in one file, it is not listed in any other file.

yaaminiv commented 4 years ago

I identified genomic location in this script and reformatted tables/created plots in this R Markdown.

Screen Shot 2020-05-08 at 1 18 15 AM

(from bottom to top of each bar: CDS, introns, flanks, intergenic)

Screen Shot 2020-05-08 at 1 26 18 AM

(from bottom to top of each bar: CDS, introns, flanks, intergenic)

Will tackle stats next