metageni / SUPER-FOCUS

A tool for agile functional analysis of shotgun metagenomic data
GNU General Public License v3.0
21 stars 12 forks source link

amino acid sequences #54

Closed lidpeck closed 4 years ago

lidpeck commented 4 years ago

Hello

I have successfully used superfocus on ORFs. However when I try to run it on amino acid sequences I get the below error - do you know why this is? I think the problem is the error message about an E (Error: Error reading input stream at line 2: Invalid character (E) in sequence) but not sure why it would give me this if I've told it to read amino acids (-p 1)?

Thanks in advance

superfocus -q Orthogroups -dir Orthgroups/output -o all_wilts -a diamond -db DB_100 -p 1 [2019-12-02 17:39:16,861 - INFO] SUPER-FOCUS: A tool for agile functional analysis of shotgun metagenomic data [2019-12-02 17:39:16,865 - INFO] 1.1) Working on: all_wilts.fasta [2019-12-02 17:39:16,865 - INFO] Aligning sequences in all_wilts.fasta to 100 using diamond diamond v0.9.14.115 | by Benjamin Buchfink buchfink@gmail.com Licensed under the GNU AGPL https://www.gnu.org/licenses/agpl.txt Check http://github.com/bbuchfink/diamond for updates.

CPU threads: 4

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)

Target sequences to report alignments for: 25

Temporary directory: /usr/local/anaconda3/lib/python3.7/site-packages/superfocus_app/db/tmp Opening the database... [0.00012s] Opening the input file... [0.000144s] Opening the output file... [0.000175s] Loading query sequences... [0.000113s] Error: Error reading input stream at line 2: Invalid character (E) in sequence diamond v0.9.14.115 | by Benjamin Buchfink buchfink@gmail.com Licensed under the GNU AGPL https://www.gnu.org/licenses/agpl.txt Check http://github.com/bbuchfink/diamond for updates.

CPU threads: 4

Loading subject IDs... [0.000217s] Error: Invalid DAA file. DIAMOND run has probably not completed successfully. [2019-12-02 17:39:16,905 - INFO] Parsing Alignments Traceback (most recent call last): File "/usr/local/anaconda3/bin/superfocus", line 10, in sys.exit(main()) File "/usr/local/anaconda3/lib/python3.7/site-packages/superfocus_app/superfocus.py", line 342, in main del_alignments) ValueError: not enough values to unpack (expected 2, got 0)

metageni commented 4 years ago

hey @lidpeck, were you able to run it with nucleotides?

metageni commented 4 years ago

@lidpeck. I just checked the tool code and noticed that BLASTp for DIAMOND is not active. I will need to fix it and release a new version. Sorry.

I should have a fix in the next hours.

lidpeck commented 4 years ago

@metageni amazing thanks. yes I ran it successfully with nucleotide sequences

metageni commented 4 years ago

@lidpeck I have pushed a new version into master (https://github.com/metageni/SUPER-FOCUS). I have not released it yet. Could you please give it a try?

thanks

lidpeck commented 4 years ago

Great thanks - I copied over the new script from do_alignment.py and now it is running for my amino acid sequences with no errors. However the output .xls files are empty although the fasta_alignments.m8 file has genes and functions in it? I have also re-run superfocus_downloadDB to see if that sorted it (but with no luck).

Screenshot 2019-12-02 at 20 07 23

Screenshot 2019-12-02 at 20 08 05

metageni commented 4 years ago

Interesting. What is your OS? Could you please try rapsearch or blast with a sub-set of your input?

lidpeck commented 4 years ago

I'm using catalina. I've just re-run with BLAST (db 98) and got the same result (fasta_alignments file with info in, all xls files empty)

metageni commented 4 years ago

@lidpeck very interesting. The only thing it tells me is that there was not a hit against the database.

lidpeck commented 4 years ago

Hi Geni, you’re right! It’s working perfectly now – thanks a million

From: Geni Silva notifications@github.com Reply to: metageni/SUPER-FOCUS reply@reply.github.com Date: Tuesday, 3 December 2019 at 20:28 To: metageni/SUPER-FOCUS SUPER-FOCUS@noreply.github.com Cc: "Peck, Lily" l.peck18@imperial.ac.uk, Mention mention@noreply.github.com Subject: Re: [metageni/SUPER-FOCUS] amino acid sequences (#54)

Caution - This email from notifications@github.com originated outside Imperial

@lidpeckhttps://github.com/lidpeck very interesting. The only thing it tells me is that there was not a hit against the database.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/metageni/SUPER-FOCUS/issues/54?email_source=notifications&email_token=AN53W7LRKCVOAU5DGIMMVTTQW26QDA5CNFSM4JTZTSKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF2WVHI#issuecomment-561343133, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AN53W7LUVARIUOOYM7QY2GTQW26QDANCNFSM4JTZTSKA.

metageni commented 4 years ago

Great.