snystrom / memes

An R interface to the MEME Suite
https://snystrom.github.io/memes/
Other
44 stars 5 forks source link

Some details about AME tool in MEME Suite #118

Closed DengEr-1993 closed 5 months ago

DengEr-1993 commented 6 months ago

Hello there,

Recently I was using AME to do motif enrichment analysis. But I am confused by some questions when I use it online.

First question: the control sequence is not necessary as usual, is that right ? So I will select the 'Shuffled input sequences' here. If not, what control sequence should I use ? Or input control sequence will enhance the accuracy in some degree ?

Secondly,

my data is about human tissues. So when input the motifs, I tried two ways:

  1. use HUMAN(Homo sapiens) DN-----HOCOMOCO Human v11 core, But the enrichment results were not good I think.

  2. Then, I also used the JASPAR2022 (NON-REDUNDANT) DNA. However, under the same parameters, I got two quite different enrichment results. Here I used the default parameters. Do you think if I want to get more results, can I change the parameters, such as maximum log-odds or E value ? I know, in HOCOMOCO V11, there are only 400 human motifs and 900+ human motifs in JASPAR2022. So I think it is one key reason.

So the question is if I only focus on human motifs, which motif database is better ? JASPAR2022 absolutely ? I was wondering if I should compile more motifs from somewhere in order to get a better results ? You know, JASPAR2024 is there, but I don't decide to use it by now.

I was really confused that if I combine the results from the two ways together ? I hope you could give me some advice here. I really need your help.

By the way, always use the MEME suite online is of low efficiency. I tried to use it in linux but failed. I also tried to install it in R, still failed.

> library(memes)
> check_meme_install()
checking main install
Cannot detect meme install

So could you tell me some better and more efficient ways to do AME analysis ?

Thanks in advance. Best regards, Xiangyi

snystrom commented 5 months ago

Hi Xiangyi,

There is on one "best" motif database, it heavily depends on your research question.

For the AME tool specifically, it is not surprising that using a different set of motifs produces different outputs. AME will use a multiple-testing correction based on the size of the motif database. You of course can always change the statistical thresholds for significance, but you should have a good reason to do this. Weakening the stringency will always make the number of matches increase.

Again, the set of background sequences to use depends heavily on your research question, it is difficult to give a simple rule that applies to all situations. You should read more in the AME manual about how the different background sets are used an make a decision about what kind of motifs you want to discover for your project.

Finally, check_meme_install() does not install the program on your machine, it just indicates whether it detects an install on your machine. To install the software, please follow the instructions at: https://meme-suite.org/meme/doc/install.html#quick_src

Good luck!