grunwaldlab / effectR

An R package to call oomycete effectors
10 stars 7 forks source link

HMM failed, please supply a valid absolute path to ORFs #18

Closed nbutyrate closed 5 years ago

nbutyrate commented 5 years ago

Hi I am trying to use effectR and getting following error

library(effectR) fasta.file <- "HSM6XRQW_contigs_pro.fasta" ORF <- seqinr::read.fasta(fasta.file) REGEX <- regex.search(ORF, motif = "custom", reg.pat = "PAAR")

candidate.paar <- hmm.search(original.seq = "HSM6XRQW_contigs_pro.fasta", regex.seq = REGEX) No alignment file is provided. Starting alignment with MAFFT.

Starting MAFFT alignment.

Executing MAFFT Please be patient MAFFT alignment finished! Starting HMM

Creating HMM profile

Working... done. Pressed and indexed 1 HMMs (1 names). Models pressed into binary file: hmmbuild.hmm.h3m SSI index for binary model file: hmmbuild.hmm.h3i Profiles (MSV part) pressed into: hmmbuild.hmm.h3f Profiles (remainder) pressed into: hmmbuild.hmm.h3p HMM profile created.

Starting HMM searches

Error: Failed to open sequence file HSM6XRQW_contigs_pro.fasta for reading

hmmsearch finished! Error in hmm.search(original.seq = "HSM6XRQW_contigs_pro.fasta", regex.seq = REGEX) : HMM failed, please supply a valid absolute path to ORFs


i have tried to use fasta.file instead of original file name, but still the same error

Tabima commented 5 years ago

Hi. This error is commonly prompted when the input file is not found. Do you have the HSM6XRQW_contigs_pro.fasta files in the same folder you are working in?

Another solution would be to use the shiny.effectR() function. It uses a graphical user interface for the prediction of candidate effectors.

nbutyrate commented 5 years ago

Yes the file is in the folder as its getting picked up as 'fasta.file'

is it possible to add custom motif in the shinny app?

candidate.paar <- hmm.search(original.seq = "HSM6XRQW_contigs_pro.fasta", regex.seq = REGEX) No alignment file is provided. Starting alignment with MAFFT.

Starting MAFFT alignment.

Executing MAFFT Please be patient MAFFT alignment finished! Starting HMM

Creating HMM profile

Working... done. Pressed and indexed 1 HMMs (1 names). Models pressed into binary file: hmmbuild.hmm.h3m SSI index for binary model file: hmmbuild.hmm.h3i Profiles (MSV part) pressed into: hmmbuild.hmm.h3f Profiles (remainder) pressed into: hmmbuild.hmm.h3p HMM profile created.

Starting HMM searches

Error: Failed to open sequence file HSM6XRQW_contigs_pro.fasta for reading

hmmsearch finished! Error in hmm.search(original.seq = "HSM6XRQW_contigs_pro.fasta", regex.seq = REGEX) : HMM failed, please supply a valid absolute path to ORFs

Tabima commented 5 years ago

Thanks for the quick reply. We are not planning on adding custom effector searches to the shiny app in the near future.

Ok, the problem is the absence of an absolute path for the HSM6XRQW_contigs_pro.fasta file. We set this as a requirement for a more reproducible manner of running HMMER.

Something that you can do is:

fasta.file <- "HSM6XRQW_contigs_pro.fasta"
fasta.file <- file.path(fasta.file)
ORF <- seqinr::read.fasta(fasta.file)
REGEX <- regex.search(ORF, motif = "custom", reg.pat = "PAAR")
candidate.rxlr <- hmm.search(original.seq = fasta.file,  motif = "custom", reg.pat = "PAAR")

The second line of code will prove you with the absolute path for the FASTA file and then HMMER should be able to recognize it. Let me know if this works.

nbutyrate commented 5 years ago

thanks for the help i tried this, and here is the current status

fasta.file <- "HSM6XRQW_contigs_pro.fasta" fasta.file <- file.path(fasta.file) ORF <- seqinr::read.fasta(fasta.file) REGEX <- regex.search(ORF, motif = "custom", reg.pat = "PAAR") candidate.rxlr <- hmm.search(original.seq = fasta.file, motif = "custom", reg.pat = "PAAR") Error in hmm.search(original.seq = fasta.file, motif = "custom", reg.pat = "PAAR") : unused arguments (motif = "custom", reg.pat = "PAAR")

Tabima commented 5 years ago

Im sorry, I had a mistake on my last message (Copied the wrong test code). Replace line 5 with

candidate.rxlr <- hmm.search(original.seq = fasta.file, regex.seq = REGEX)
nbutyrate commented 5 years ago

I tried it

fasta.file <- "HSM6XRQW_contigs_pro.fasta" fasta.file <- file.path(fasta.file) ORF <- seqinr::read.fasta(fasta.file) REGEX <- regex.search(ORF, motif = "custom", reg.pat = "PAAR") candidate.rxlr <- hmm.search(original.seq = fasta.file, regex.seq = REGEX) No alignment file is provided. Starting alignment with MAFFT.

Starting MAFFT alignment.

Executing MAFFT Please be patient MAFFT alignment finished! Starting HMM

Creating HMM profile

Working... done. Pressed and indexed 1 HMMs (1 names). Models pressed into binary file: hmmbuild.hmm.h3m SSI index for binary model file: hmmbuild.hmm.h3i Profiles (MSV part) pressed into: hmmbuild.hmm.h3f Profiles (remainder) pressed into: hmmbuild.hmm.h3p HMM profile created.

Starting HMM searches

Error: Failed to open sequence file HSM6XRQW_contigs_pro.fasta for reading

hmmsearch finished! Error in hmm.search(original.seq = fasta.file, regex.seq = REGEX) : HMM failed, please supply a valid absolute path to ORFs

Tabima commented 5 years ago

Can you provide me with the prompt from fasta.file, please?

nbutyrate commented 5 years ago

fasta.file [1] "HSM6XRQW_contigs_pro.fasta"

Tabima commented 5 years ago

Alright, you still don't have the absolute path of the file. I thought file.path() would work.

Try this:

fasta.file <- "HSM6XRQW_contigs_pro.fasta"
fasta.file <- normalizePath(fasta.file)
ORF <- seqinr::read.fasta(fasta.file)
REGEX <- regex.search(ORF, motif = "custom", reg.pat = "PAAR")
candidate.rxlr <- hmm.search(original.seq = fasta.file, regex.seq = REGEX)
nbutyrate commented 5 years ago

Success, just one thing to confirm, the result table looks like this

$motif.table Sequence.ID RxLR.number RxLR.position. EER.number EER.position 1 k105_2323_55 2 1005,1292 1 743

just to confirm we used PAAR, the columns in the table are labeled as RxLR?

could you please guide me how to modify the commands if i need to identify something like [S/T]xExPx[I/V]

Tabima commented 5 years ago

Awesome, glad to hear it worked.

Remember to change the motif to custom and thereg.pat to your regula expression in your effector.summary command. In your case, it'd be something of the likes of:

effector.summary(candidate.rxlr, motif = "custom", reg.pat = "[s,t].e.p.[i,v]")

That regex will provide you with a sequence that has a motif that starts with S or T, followed by any letter, an E, any letter, a P, any letter, and either an I or a V. You can find more info on regular expression on R here and here

Best of luck!

palc commented 5 years ago

Hi, Just adding here since I have a similar query. I am giving a trial with a Bacterial AA fasta file (GCA_001766235.1_ASM176623v1_protein.faa, later I will replce this with a oomeycete AA fasta file) I have questions in these steps: REGEX <- regex.search(ORF, motif = "custom", reg.pat = "PAAR") # Q: if you look for only RxLR and CRN effectors do you need to provide these extra info - motif = "custom", reg.pat = "PAAR" or we exclude that part or replace the PAARwith somethiong else?

candidate.rxlr <- hmm.search(original.seq = fasta.file, regex.seq = REGEX) #OK

effector.summary(candidate.rxlr) #Q: Do we need to provide more info if we are only looking for RxLR and CRN effectors?

Tabima commented 5 years ago

Hi Palc,

Q: if you look for only RxLR and CRN effectors do you need to provide these extra info - motif = "custom", reg.pat = "PAAR" or we exclude that part or replace the PAAR with somethiong else?

No, just run it with the included "CRN" or "RxLR" options.

Q: Do we need to provide more info if we are only looking for RxLR and CRN effectors?

Same answer as before

palc commented 5 years ago

Thanks. My contig file has 5000 sequences. For RxLR and CRN effectors, I did the following REGEX <- regex.search(ORF, motif='RxLR') REGEX2 <- regex.search(ORF, motif='CRN')

For RxLR, it worked fine candidate.rxlr <- hmm.search(original.seq = fasta.file, regex.seq = REGEX, num.threads = 16)

but for CRN, I see an error message that it needs at least 4 sequences for HMM, not sure why it generates this error. candidate.crn <- hmm.search(original.seq = fasta.file, regex.seq = REGEX2, num.threads = 16) Error in hmm.search(original.seq = fasta.file, regex.seq = REGEX2, num.threads = 16) : Not enough sequences for HMM step. At least 4 sequences are required.

Tabima commented 5 years ago

Thanks for the info.

How many sequences are reported for the REGEX2 object? If you have less than 4 sequences then you cannot build the alignment via hmmer.

On Wed, Jul 31, 2019 at 11:36 PM Chandan Pal notifications@github.com wrote:

Thanks. My contig has 5000 sequences. For RxLR and CRN effectors, I did the following REGEX <- regex.search(ORF, motif='RxLR') REGEX2 <- regex.search(ORF, motif='CRN')

For RxLR, it worked fine candidate.rxlr <- hmm.search(original.seq = fasta.file, regex.seq = REGEX, num.threads = 16)

but for CRN, I see an error message that it needs at least 4 sequences for HMM, not sure why. candidate.crn <- hmm.search(original.seq = fasta.file, regex.seq = REGEX2, num.threads = 16) Error in hmm.search(original.seq = fasta.file, regex.seq = REGEX2, num.threads = 16) : Not enough sequences for HMM step. At least 4 sequences are required.

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/effectR/issues/18?email_source=notifications&email_token=AAG3DET6HGFBHTBSZK4V7G3QCJ75PA5CNFSM4HL4OJZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3JPF3A#issuecomment-517141228, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG3DEVZGR4JPKU4CNRM3OLQCJ75PANCNFSM4HL4OJZQ .

palc commented 5 years ago

It seems like REGEX2 has 157 sequences.

length(REGEX)
[1] 20
> length(REGEX2)
[1] 157
Tabima commented 5 years ago

Thanks for the info. Would you mind sharing a subset of your data for me to reproduce the error and find a solution? You can send it to my email.

On Thu, Aug 1, 2019 at 9:24 PM Chandan Pal notifications@github.com wrote:

It seems like REGEX2 has 157 sequences.

length(REGEX) [1] 20

length(REGEX2) [1] 157

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/grunwaldlab/effectR/issues/18?email_source=notifications&email_token=AAG3DESNGWVXXH6GEJ4VZQLQCOZGTA5CNFSM4HL4OJZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3MRG5A#issuecomment-517542772, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG3DEXJ4QVX4U2O2ON64YTQCOZGTANCNFSM4HL4OJZQ .

palc commented 5 years ago

I have sent you the file via email. Thanks for looking into it.