Closed jirivorel closed 8 years ago
The issue is the slash in the FASTA header, which will cause EMBOSS to reformat your IDs or just halt. You can fix your DNA sequences with this script: clean_multifasta.pl. That should solve the problem for you.
Let me know if you have any other questions or issues.
Many thanks for the prompt reply now is it ok without error message. But still I've some problems ...
This is a couple of problems that prevent me get the coding sequence of amino acids and nucleotides equivalent.
Thanks for the report. The first issue is fixed on the master branch (reporting of ORF type), and I'm looking into the second issue (the number or ORFs reported).
Okay, I have looked at the tests and I'll need to see an example to help with the last issue you mention. In the output I get, I am seeing only the longest ORF for each input sequence. For a multi-FASTA file, that still means there will be many output sequences, specifically, there will be one for every input sequence as long as it passes the thresholds.
OK, here is a short fasta file with 21 nucl. sequences (https://github.com/sestaton/HMMER2GO/files/154531/nucl_seq_input.fasta.zip) I used single FASTA file as an input. After writting this command:
hmmer2go getorf -i nucl_seq_input.fasta -o prot_seq_minlen_100.fasta -l 100
I get FASTA output in aminoacids with 1214 sequnces - this ... (https://github.com/sestaton/HMMER2GO/files/154539/prot_seq_minlen_100.fasta.zip)
You might want to check which EMBOSS version you are using, that is likely the difference. I am using EMBOSS:6.5.7.0
. With this version, I am getting 725 sequences when reporting all ORFs, and 22 sequences when reporting only the longest using the command you showed. There are 22 sequences reported and not 21 because there are 2 ORFs of the same length and I don't want to choose one over the other since it may influence the results. I will add a warning when this case is encountered.
OK, I used version 6.6.0.0. I'll contact your server administrator with request for this older version and I'll try do it again and inform you.
That is interesting, can you show the output of the command getorf -version
to be clear. To my knowledge, version 6.5.7 is the latest and I'm not sure where version 6.6.0.0 can be found. It is not on the FTP servers.
If you download emboss-latest.tar.gz
from the FTP site, it is version 6.5.7. The EMBOSS download page also suggests version 6.5.7 is the latest stable version.
Yes, I know EMBOSS download page (where version 6.5.7. can be download, it is true), but after writing getorf -version
and embossversion
I get that version is: EMBOSS:6.6.0.0. Version of HMMER is 3.1b2.
Aside from the EMBOSS version question, can you confirm you only ran this command:
hmmer2go getorf -i nucl_seq_input.fasta -o prot_seq_minlen_100.fasta -l 100
When I run that I get 22 sequences.
The following command will tell you the hmmer2go version: hmmer2go --version
.
Yes, I ran this command in the same form and I get 1214 protein sequences. My hmmer2go version is 0.17.1.
Can you show the output of this command:
grep -c ">" nucl_seq_input.fasta; hmmer2go getorf -i nucl_seq_input.fasta -o prot_seq_minlen_100.fasta -l 100; grep -c ">" prot_seq_minlen_100.fasta
Also, I would recommend updating to the latest version to fix the first issue you mentioned about reporting the ORF type.
Output is: 21 1214
What operating system are you using (and version)? I wouldn't think that the EMBOSS version would influence this, but unfortunately it is hard to test without knowing where to find that version.
My only suggestion is to try the latest hmmer2go version, and once I know your OS I can try it on a cloud instance with the same set up.
I am using Ubuntu 14.04.4 LTS (GNU/Linux 3.13.0-32-generic x86_64). But it is our server for computational biology, now I am waiting until our administrator makes my requirements about software - the latest hmmer2go version a EMBOSS 6.5.7 version.
So, we upgrade hmmer2go on version 0.17.2 and now it's working right - somehow. I can get right count of ORFs in proteins and nucleotides too. So thank you for your time and patience.
That is good to hear. I'll close this issue, but don't hesitate to raise other issues if you have any questions. Thanks.
Hi @abbyhudak,
I'm unclear on what command you have run. The Makefile.PL
script just sets up the package to be tested and installed. It does not run any analysis.
If you are having issues with the hmmer2go getorf
command, you may want to try the clean_multifasta.pl script listed above in this discussion. Please try that to convert your sequences to a usable format by EMBOSS.
You will also need to build/install the package to use the programs, and it is unclear to me if that was done correctly. Let me know if you have issues with above suggestions.
Thanks.
Sorry, I did not mean to say Makefile.pl I meant clean_multifasta.pl. I tried using the clean_multifasta.pl script but I may not have used it correctly. I am not sure which lines of the code to actually run in my terminal.
Hi @abbyhudak,
The script should work fine based on the identifier you posted. If you have a file of sequences named "trinity.fas" then you can use the script like so:
perl clean_multifasta.pl -i trinity.fas -o trinity_clean.fas
The file of transformed IDs will be in "trinity_clean.fas" which is the argument to "-o" above, the output file.
Hi @abbyhudak,
I'm not sure what to make of your comment without more information. Please show the command and the output of the program (it should print results to the terminal). Did check the output file?
Hi @abbyhudak,
What you describe is unrelated to HMMER2GO or this thread, but I don't mind trying to help. Please send me an email and we can continue the discussion that way. If there is something related to the original issue in this thread we can pick up the discussion here.
Thanks, Evan
Hi @abbyhudak,
You can email me at: evan@evanstaton.com. The question now is about running a script and it would be better to resolve that offline so others are not getting notifications for each message and so we can keep the discussion here focused on a specific issue.
Thanks, Evan
FYI, this should not be an issue going forward. I've added a method to modify and store the identifiers (in v0.17.7) so any sequence format should work, and the original file will be untouched.
Great, thanks!
Abby
From: Evan Staton notifications@github.com Sent: Thursday, March 29, 2018 4:23:42 PM To: sestaton/HMMER2GO Cc: abbyhudak; Mention Subject: Re: [sestaton/HMMER2GO] Error with Emboss identifiers (#7)
FYI, this should not be an issue going forward. I've added a method to modify and store the identifiers (in v0.17.7) so any sequence format should work, and the original file will be untouched.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/sestaton/HMMER2GO/issues/7#issuecomment-377401949, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AkDxFdi05V5aN_UFYb2n6T0zPWJuO53uks5tjWz-gaJpZM4HmTHG.
My command: ./clean_multifasta.pl -i Mpotamo.fasta -o Mpotamo_clean.fasta
My error:
./clean_multifasta.pl: line 7: syntax error near unexpected token newline' ./clean_multifasta.pl: line 7:
<!DOCTYPE html>'
Hi @abbyhudak,
Could you create a separate issue for this topic at: https://github.com/sestaton/sesbio/issues
I'd like to keep this message board for the hmmer2go issues. Also, please show a bit of the file for testing. That message suggests there is likely something unexpected with the input.
Thanks.
Hi Evan, I wanna use Hmmer2go - it's great tool for my purpose, but after run this simple command ...
$ hmmer2go getorf -i Data/nk_seq -o prot_seq_trans -l 90
I get an error message ...
ERROR: Identifiers such as 'Locus_1_Transcript_1/1_Confidence_1.000_Length_826' will produce unexpected renaming with EMBOSS. Exiting. at /usr/local/share/perl/5.18.2/HMMER2GO/Command/getorf.pm line 161, <$fh> chunk 2.
I am not familiar with Emboss, so i am not sure, what is wrong with my seq identifiers.
Thank you for your time and reply
Jirka