merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
423 stars 144 forks source link

Searching for motifs in inverted repeats #2120

Closed FlorianTrigodet closed 11 months ago

FlorianTrigodet commented 11 months ago

I have added some functions to anvi-report-inversions to use MEME and search for DNA motifs in the inverted repeats surrounding the inversions.

To do that, I needed to create a fasta file for each inversions. In that fasta, there are 4 sequences: the left and right inverted repeat (padded with 20bp) and their reverse complement. The motif search is constrained to palindromic motifs only. Works like a charm to find site-specific recombinase DNA binding site.

I have also re-organised the output directory as follow:

|-- ALL-STRETCHES-CONSIDERED.txt
|-- INVERSION-ACTIVITY.txt
|-- INVERSIONS-CONSENSUS.txt
|-- PER_INV
|   |-- ALL_INVERSIONS
|   |   |-- MEME
|   |   |   |-- logo1.eps
|   |   |   |-- [...]
|   |   |   |-- logo_rc2.png
|   |   |   |-- meme.html
|   |   |   |-- meme.txt
|   |   |   `-- meme.xml
|   |   |-- inverted_repeats.fasta
|   |   `-- run-MEME.log
|   |-- INV_0001
|   |   |-- MEME
|   |   |   |-- logo1.eps
|   |   |   |-- [...]
|   |   |   |-- logo_rc3.png
|   |   |   |-- meme.html
|   |   |   |-- meme.txt
|   |   |   `-- meme.xml
|   |   |-- SURROUNDING-FUNCTIONS.txt
|   |   |-- SURROUNDING-GENES.txt
|   |   |-- inverted_repeats.fasta
|   |   `-- run-MEME.log
|   `-- INV_0002
|       |-- MEME
|       |   |-- logo1.eps
|       |   |-- [..]
|       |   |-- logo_rc3.png
|       |   |-- meme.html
|       |   |-- meme.txt
|       |   `-- meme.xml
|       |-- SURROUNDING-FUNCTIONS.txt
|       |-- SURROUNDING-GENES.txt
|       |-- inverted_repeats.fasta
|       `-- run-MEME.log
`-- PER_SAMPLE
    |-- INVERSIONS-IN-S01.txt
    |-- INVERSIONS-IN-S02.txt
    `-- INVERSIONS-IN-S03.txt

We used to have a single SURROUNDING-FUNCTIONS.txt and SURROUNDING-GENES.txt for all inversions. I split them per inversion.

The motif search is for for each inversion, and also by pooling all inversion (that's how you find which are linked together by the same recombinase).

At this moment, there is no need for a dedicated MEME driver. But it is possible that we will eventually need one that will include functions to parse MEME outputs, especially to generate an html output. And of course if we want to use MEME elsewhere in anvi'o.

If you want to test it, you need to have MEME installed in your anvi'o installation. You will maybe need to reinstall anvi'o and add meme in your list of conda package.

FlorianTrigodet commented 11 months ago

This is a significant improvement and a unique feature of our tool compared to other approach out there. I'd be happy if it can make it to v8.

Will need the html output to make it fully user friendly. And write a paper about it too ;)

FlorianTrigodet commented 11 months ago

Let's not merge this right now. I want to write a driver and a parser for MEME. If I'm not wrong, these should be two separate files in the code base.

The driver will be useful for other uses of MEME and the parser will allow me to connect the inversions with their respective motif group!

meren commented 11 months ago

So far looks good, @FlorianTrigodet! Excellent job.

You should also add meme into https://github.com/merenlab/anvio/blob/master/.conda/environment.yaml as a dependency.

FlorianTrigodet commented 11 months ago

We're not adding driver or parser for MEME at the moment. Can we merge it for v8 release?

meren commented 11 months ago

Yes we can!