sr320 / course-fish546-2016

6 stars 5 forks source link

Identifying CpG sites/islands in FASTA file? #91

Closed laurahspencer closed 7 years ago

laurahspencer commented 7 years ago

I'm seeking to identify potential methylation sites (CpG's) in my geoduck genome; although the CpG sites aren't definitive methylation sites, they have potential to be. I believe CoGe only has programs to identify methylation in bisulfite treated RNASeq files (FASTQ), and it doesn't allow one to select a FASTA file. CoGe format options are: CoGe format options

Does anyone have suggestions on a program/script to use to identify CpG sites, or possibly "islands", in a genome? Thanks!

sr320 commented 7 years ago

fuzznuc is a tool that finds motifs (in this case your are looking for all CG motifs in your sequence.

I believe Galaxy has this tool, you can run it from the command-line, and there are various other web sites that host it.

ie http://www.bioinformatics.nl/cgi-bin/emboss/fuzznuc

There are several output types, but you might want GFF for browser..

EMBOSS__fuzznuc_1DDE4663.png
laurahspencer commented 7 years ago

Thanks! I've been using Galaxy's online tool (usegalaxy.org), and taking snapshots of the inputs for reproducibility; do you recommend downloading the package and running locally?

sr320 commented 7 years ago

As long as it is documented, Galaxy is fine

On Thu, Nov 17, 2016 at 12:39 PM laurahspencer notifications@github.com wrote:

Thanks! I've been using Galaxy's online tool (usegalaxy.org), and taking snapshots of the inputs for reproducibility; do you recommend downloading the package and running locally?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/sr320/course-fish546-2016/issues/91#issuecomment-261362581, or mute the thread https://github.com/notifications/unsubscribe-auth/AEPHt918IDJWvijG8XoplMw7C9dMrCWGks5q_LtEgaJpZM4K1vxg .