grunwaldlab / effectR

An R package to call oomycete effectors
10 stars 7 forks source link

Separation of MAFFT and HMM search #15

Closed Neato-Nick closed 5 years ago

Neato-Nick commented 5 years ago

Just started trying out the package. Very easy to use, I love it.

The first time I ran hmm.search on my data, MAFFT finished successfully but then hmmsearch errored out. This was my own fault - I gave a relative path instead of an absolute path to the original.seq parameter leading to my original ORFs. But because the hmm.search function returned an error, it didn't return an object that had the finished alignment of regex candidate rxlrs. The second time around running hmm.search, I gave the correct path to my original ORFs, and after another round of MAFFT I got my hmm candidate RXLRs

Since MAFFT doesn't need the original ORFs file, seems like users could save some time in not re-aligning their regex candidate RXLRs. A couple of ideas for solutions:

1) Before running MAFFT, validate the path given to original.seq actually leads to a fasta file. This is probably the easiest solution to help out people like me who make simple mistakes 2) If the MAFFT alignment succeeds when running hmm.search but the actual hmm search fails, maybe give a warning and return an object with only the Alignment and REGEX elements (and obviously without the HMM, HMM_Table elements)? 3) Make another separate function to call MAFFT and save it into an object or file for executing with hmm.search. I know I could do this in the terminal itself... I've obviously got MAFFT installed so I could just run the alignment outside the R environment and use the import options you've already got. But it might be nice to integrate it all into the R session. I kind of like options 1 and 2 better than this one..

Neato-Nick commented 5 years ago

To see new behavior in the pull request, put the test_infestans.fasta dataset in your working directory and perform the following:

library(seqinr)
library(effectR)
relative_path <- "test_infestans.fasta"
absolute_path <- system.file("extdata", "test_infestans.fasta", package = "effectR")
data <- read.fasta(relative_path)
regex <- regex.search(data)

hmm <- hmm.search(original.seq = absolute_path, regex.seq = regex)
hmm <- hmm.search(original.seq = relative_path, regex.seq = regex)
hmm <- hmm.search(original.seq = relative_path, regex.seq = regex, save.alignment = T)

Edit: just to be clear, I implemented solution 2 as I described above

Ramakrishna0007 commented 5 years ago

good evening sir... this is Ramakrishna... The high number of effector proteins predicted in the HMM step is a result of the low thresholds used by our package in order to obtain as many candidate effectors as possible sir...... what is the threshold you have used in this package... low threshold means what? and one more is how did you separate the non-redundant and redundant candidates.

Neato-Nick commented 5 years ago

@Ramakrishna0007 please open a new GitHub issue for this discussion, since it's unrelated to this one. We would be happy to answer your questions there

On Wed, May 8, 2019, 6:43 AM Ramakrishna0007 notifications@github.com wrote:

good evening sir... this is Ramakrishna... The high number of effector proteins predicted in the HMM step is a result of the low thresholds used by our package in order to obtain as many candidate effectors as possible sir...... what is the threshold you have used in this package... low threshold means what? and one more is how did you separate the non-redundant and redundant candidates.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/effectR/issues/15#issuecomment-490490833, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMUDUUTBHTMY2KSNWKI26LPULKJZANCNFSM4GTVAF2A .